12-Factor Agents
Status: Draft Purpose: Principles for building production-ready AI agents, adapted for Forge Source: HumanLayer 12-Factor Agents
Why 12 Factors?
The original 12-Factor App gave us a shared language for building reliable web services. The 12-Factor Agents framework does the same for AI agents.
Key insight from HumanLayer:
"Most AI agents that actually succeed in production aren't magical autonomous beings at all – they're mostly well-engineered traditional software, with LLM capabilities carefully sprinkled in at key points."
This aligns perfectly with Forge's thesis: primitives over frameworks, deterministic code with strategic AI.
The 12 Factors (Quick Reference)
┌─────────────────────────────────────────────────────────────────┐
│ THE 12 FACTORS FOR PRODUCTION AI AGENTS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ INPUTS │
│ 1. Natural Language → Tool Calls (structured extraction) │
│ 2. Own Your Prompts (no framework magic) │
│ 3. Own Your Context Window (active management) │
│ │
│ EXECUTION │
│ 4. Tools Are Just Structured Output (JSON + code, demystified) │
│ 5. Unify Execution + Business State (single source of truth) │
│ 6. Launch/Pause/Resume (checkpoints, recovery) │
│ │
│ CONTROL │
│ 7. Contact Humans with Tool Calls (approval as a tool) │
│ 8. Own Your Control Flow (you decide, not the LLM) │
│ 9. Compact Errors into Context (learn from failures) │
│ │
│ ARCHITECTURE │
│ 10. Small, Focused Agents (<100 tools, <20 steps) │
│ 11. Trigger from Anywhere (webhooks, APIs, queues) │
│ 12. Stateless Reducer ((state, event) → state) │
│ │
└─────────────────────────────────────────────────────────────────┘
Factor 1: Natural Language → Tool Calls
"The core LLM superpower is converting natural language to structured data."
The Pattern:
User intent (natural language) → Structured tool call (JSON) → Deterministic execution (code)
Forge Implementation:
// internal/ai/extraction.go
type ToolCall struct {
Name string `json:"name"`
Args json.RawMessage `json:"args"`
}
func ExtractToolCall(ctx context.Context, client *ai.Client, userInput string, availableTools []Tool) (*ToolCall, error) {
prompt := buildExtractionPrompt(userInput, availableTools)
response, err := client.Complete(ctx, prompt, ai.Options{
ResponseFormat: ai.JSONSchema(ToolCallSchema),
})
if err != nil {
return nil, err
}
var call ToolCall
if err := json.Unmarshal([]byte(response.Text), &call); err != nil {
return nil, fmt.Errorf("invalid tool call: %w", err)
}
return &call, nil
}
Forge Opinion:
- Use structured output (JSON mode) for all tool extraction
- Validate against schema before execution
- The LLM's job is classification, not execution
See also: AI-INTEGRATION-LEVELS.md (Level 2: A2UI pattern)
Factor 2: Own Your Prompts
"Production quality requires hand-crafted prompts, not abstractions."
The Pattern:
Version prompts like code. Test them. Don't hide them behind framework abstractions.
Forge Implementation:
internal/ai/prompts/
├── extraction.go # Tool call extraction prompts
├── summarization.go # Content summarization prompts
├── validation.go # Output validation prompts
└── prompts_test.go # Prompt regression tests
// internal/ai/prompts/extraction.go
const ToolExtractionPrompt = `You are a tool router. Given the user's request and available tools,
output a JSON object with the tool name and arguments.
Available tools:
{{range .Tools}}
- {{.Name}}: {{.Description}}
Args: {{.ArgsSchema}}
{{end}}
User request: {{.UserInput}}
Output format: {"name": "tool_name", "args": {...}}
Rules:
- Select exactly one tool
- If no tool matches, use "unknown"
- Never invent tools not in the list
`
Forge Opinion:
- Prompts live in code, not config files
- Version control all prompts
- Test prompts with golden examples
- No prompt abstraction layers (LangChain templates)
See also: BUILD-TIME-AGENTS.md (Pattern 1: Skills Framework)
Factor 3: Own Your Context Window
"Don't blindly append; actively manage what the LLM sees."
The Pattern:
Context windows have a "dumb zone" (middle 40-60%) where recall degrades. Actively curate what goes in.
Forge Implementation:
// internal/ai/context.go
type ContextManager struct {
maxTokens int
reservedStart int // Keep first N tokens (system prompt)
reservedEnd int // Keep last N tokens (recent messages)
}
func (cm *ContextManager) Build(systemPrompt string, history []Message, currentInput string) string {
// 1. System prompt (always first, protected)
ctx := systemPrompt
// 2. Summarize old history if needed
if cm.tokenCount(history) > cm.maxTokens/2 {
history = cm.summarizeOldMessages(history)
}
// 3. Recent history (protected, last N messages)
recentCount := min(5, len(history))
recent := history[len(history)-recentCount:]
// 4. Current input (always included)
ctx += formatMessages(recent) + currentInput
return ctx
}
func (cm *ContextManager) summarizeOldMessages(msgs []Message) []Message {
// Compress old messages into summary
// Keep recent messages intact
}
Forge Opinion:
- Never exceed 40% of context window with history
- System prompt: first 10%
- Recent context: last 30%
- Middle: summarized or omitted
- Use repo maps to provide structure without bloat
See also: BUILD-TIME-AGENTS.md (Pattern 7: Repo Map)
Factor 4: Tools Are Just Structured Output
"Demystify tool use as simple routing."
The Pattern:
A "tool call" is just JSON extraction + a switch statement. No magic.
Forge Implementation:
// internal/ai/tools/router.go
type Tool struct {
Name string
Description string
ArgsSchema json.RawMessage
Handler func(ctx context.Context, args json.RawMessage) (any, error)
}
var registry = map[string]Tool{
"get_tournament": {
Name: "get_tournament",
Description: "Retrieve tournament details by ID",
ArgsSchema: []byte(`{"type":"object","properties":{"id":{"type":"string"}}}`),
Handler: getTournamentHandler,
},
"list_matches": {
Name: "list_matches",
Description: "List matches for a tournament",
ArgsSchema: []byte(`{"type":"object","properties":{"tournament_id":{"type":"string"}}}`),
Handler: listMatchesHandler,
},
}
func ExecuteTool(ctx context.Context, call ToolCall) (any, error) {
tool, ok := registry[call.Name]
if !ok {
return nil, fmt.Errorf("unknown tool: %s", call.Name)
}
// Validate args against schema
if err := validateJSON(call.Args, tool.ArgsSchema); err != nil {
return nil, fmt.Errorf("invalid args: %w", err)
}
return tool.Handler(ctx, call.Args)
}
Forge Opinion:
- Tools are registered in code, not discovered magically
- Schema validation before execution
- Handlers are regular Go functions
- No special "tool frameworks"
See also: RUNTIME-AGENTS.md (Pattern 5: Output Validation)
Factor 5: Unify Execution + Business State
"Keep the agent's decision-making state synchronized with your application's state."
The Pattern:
Agent state and business state have different lifecycles but must stay consistent.
Forge Implementation:
// internal/agent/state.go
type AgentRun struct {
// Execution state (agent lifecycle)
RunID string `json:"run_id"`
Status RunStatus `json:"status"` // pending, running, paused, completed, failed
StartedAt time.Time `json:"started_at"`
Steps []Step `json:"steps"`
CurrentStep int `json:"current_step"`
// Business state (domain lifecycle)
TournamentID string `json:"tournament_id,omitempty"`
MatchID string `json:"match_id,omitempty"`
Context map[string]any `json:"context"`
}
// Persist both together
func (r *AgentRun) Save(ctx context.Context, db *sql.DB) error {
// Single transaction for both states
tx, _ := db.BeginTx(ctx, nil)
defer tx.Rollback()
// Save execution state
if err := r.saveExecutionState(tx); err != nil {
return err
}
// Save business state changes
if err := r.saveBusinessState(tx); err != nil {
return err
}
return tx.Commit()
}
Forge Opinion:
- Single transaction for state updates
- Execution state in
agent_runstable - Business state in domain tables
- Both updated atomically
Factor 6: Launch/Pause/Resume
"Design agents to start, pause, and resume execution cleanly."
The Pattern:
Agents should checkpoint their state and resume from any step.
Forge Implementation:
// internal/agent/runner.go
type Runner struct {
db *sql.DB
ai *ai.Client
tools map[string]Tool
}
func (r *Runner) Run(ctx context.Context, runID string) error {
run, err := r.loadRun(ctx, runID)
if err != nil {
return err
}
for run.CurrentStep < len(run.Steps) {
// Check for pause request
if r.shouldPause(ctx, runID) {
run.Status = StatusPaused
return run.Save(ctx, r.db)
}
step := run.Steps[run.CurrentStep]
// Execute step
result, err := r.executeStep(ctx, run, step)
if err != nil {
run.Status = StatusFailed
run.Steps[run.CurrentStep].Error = err.Error()
return run.Save(ctx, r.db)
}
// Checkpoint after each step
run.Steps[run.CurrentStep].Result = result
run.CurrentStep++
if err := run.Save(ctx, r.db); err != nil {
return err
}
}
run.Status = StatusCompleted
return run.Save(ctx, r.db)
}
func (r *Runner) Resume(ctx context.Context, runID string) error {
// Simply call Run - it picks up from CurrentStep
return r.Run(ctx, runID)
}
Forge Opinion:
- Checkpoint after every step
- Resume = load state + continue loop
- Pause = set flag + save state
- Recovery = resume from last checkpoint
Factor 7: Contact Humans with Tool Calls
"Treat human approval as just another tool type."
The Pattern:
When the agent needs human input, it calls a "human" tool and waits.
Forge Implementation:
// internal/agent/tools/human.go
var HumanApprovalTool = Tool{
Name: "request_human_approval",
Description: "Request human approval before proceeding with a sensitive action",
ArgsSchema: []byte(`{"type":"object","properties":{"action":{"type":"string"},"reason":{"type":"string"}}}`),
Handler: requestHumanApproval,
}
func requestHumanApproval(ctx context.Context, args json.RawMessage) (any, error) {
var req struct {
Action string `json:"action"`
Reason string `json:"reason"`
}
json.Unmarshal(args, &req)
// Create approval request
approval := &ApprovalRequest{
ID: uuid.New().String(),
RunID: getRunID(ctx),
Action: req.Action,
Reason: req.Reason,
Status: "pending",
CreatedAt: time.Now(),
}
// Save and notify
if err := approval.Save(ctx); err != nil {
return nil, err
}
// Return special response that pauses the run
return PauseForApproval{ApprovalID: approval.ID}, nil
}
Forge Opinion:
- Human approval is a first-class tool
- Agent pauses when approval requested
- Approval via API, Slack, email, etc.
- Resume when approved/rejected
See also: BUILD-TIME-AGENTS.md (Pattern 10: Assumption Checkpoint)
Factor 8: Own Your Control Flow
"Implement your own orchestration logic rather than relying entirely on LLM decision-making."
The Pattern:
The agent loop is YOUR code. LLM is called at decision points, not in charge.
Forge Implementation:
// internal/agent/flow.go
func ProcessTournamentRequest(ctx context.Context, input string) error {
// 1. Extract intent (LLM)
intent, err := extractIntent(ctx, input)
if err != nil {
return err
}
// 2. Route based on intent (YOUR CODE)
switch intent.Type {
case "create_tournament":
return handleCreateTournament(ctx, intent.Args)
case "update_bracket":
return handleUpdateBracket(ctx, intent.Args)
case "generate_schedule":
// This one uses more AI
return handleGenerateSchedule(ctx, intent.Args)
default:
return handleUnknownIntent(ctx, input)
}
}
func handleGenerateSchedule(ctx context.Context, args map[string]any) error {
// Mix of deterministic and AI steps
// 1. Load constraints (deterministic)
constraints := loadSchedulingConstraints(ctx, args["tournament_id"])
// 2. Generate candidate schedule (AI)
schedule, err := generateScheduleWithAI(ctx, constraints)
if err != nil {
return err
}
// 3. Validate against rules (deterministic)
if err := validateSchedule(schedule, constraints); err != nil {
// Retry with feedback
return generateScheduleWithAI(ctx, constraints, err.Error())
}
// 4. Save (deterministic)
return saveSchedule(ctx, schedule)
}
Forge Opinion:
- You write the loop, not LangGraph
- LLM for: intent extraction, content generation, complex reasoning
- Code for: routing, validation, persistence, error handling
- Clear boundaries between AI and deterministic logic
See also: RUNTIME-AGENTS.md (AI Tooling Decisions)
Factor 9: Compact Errors into Context
"Efficiently represent error information for the LLM to process."
The Pattern:
When something fails, give the LLM concise, actionable feedback.
Forge Implementation:
// internal/ai/errors.go
type CompactError struct {
Step string `json:"step"`
Error string `json:"error"`
Hint string `json:"hint"`
Retryable bool `json:"retryable"`
}
func CompactifyError(step string, err error) CompactError {
// Convert verbose errors to concise feedback
switch {
case errors.Is(err, sql.ErrNoRows):
return CompactError{
Step: step,
Error: "not_found",
Hint: "The requested resource does not exist",
Retryable: false,
}
case errors.Is(err, context.DeadlineExceeded):
return CompactError{
Step: step,
Error: "timeout",
Hint: "Operation took too long, try with smaller scope",
Retryable: true,
}
default:
return CompactError{
Step: step,
Error: "unknown",
Hint: truncate(err.Error(), 100),
Retryable: true,
}
}
}
// Include in retry prompt
func BuildRetryPrompt(original string, compactErr CompactError) string {
return fmt.Sprintf(`Previous attempt failed:
Step: %s
Error: %s
Hint: %s
Please try again with this feedback in mind.
Original request: %s`,
compactErr.Step, compactErr.Error, compactErr.Hint, original)
}
Forge Opinion:
- Never dump full stack traces to LLM
- Categorize errors (not_found, timeout, validation, etc.)
- Include actionable hints
- Mark retryable vs terminal failures
Factor 10: Small, Focused Agents
"Build specialized agents for narrow tasks rather than monolithic general-purpose ones."
The Pattern:
<100 tools, <20 steps. Compose multiple focused agents for complex workflows.
Forge Implementation:
// internal/agent/registry.go
// BAD: One agent with 50 tools
var MonolithicAgent = Agent{
Name: "tournament_agent",
Tools: allTournamentTools, // 50+ tools
}
// GOOD: Focused agents with clear responsibilities
var (
SchedulingAgent = Agent{
Name: "scheduling",
Description: "Generates and optimizes tournament schedules",
Tools: []Tool{generateSchedule, optimizeSchedule, checkConflicts},
MaxSteps: 10,
}
BracketAgent = Agent{
Name: "bracket",
Description: "Manages bracket creation and seeding",
Tools: []Tool{createBracket, seedTeams, advanceWinner},
MaxSteps: 8,
}
NotificationAgent = Agent{
Name: "notification",
Description: "Sends notifications to participants",
Tools: []Tool{notifyTeam, notifyAll, scheduleReminder},
MaxSteps: 5,
}
)
// Orchestrator composes focused agents
func ProcessComplexRequest(ctx context.Context, input string) error {
// Route to appropriate agent
agent := routeToAgent(input)
// Or compose multiple agents
if needsMultipleAgents(input) {
return runAgentPipeline(ctx, input, []Agent{
SchedulingAgent,
NotificationAgent,
})
}
return agent.Run(ctx, input)
}
Forge Opinion:
- Each agent: <10 tools, <10 steps
- Clear, single responsibility
- Compose via orchestrator (your code)
- Name agents by capability, not technology
See also: BUILD-TIME-AGENTS.md (Pattern 2: Agent Routing)
Factor 11: Trigger from Anywhere
"Design agents to activate from multiple entry points and channels."
The Pattern:
Same agent logic, multiple triggers: HTTP, webhook, queue, schedule, CLI.
Forge Implementation:
// internal/agent/triggers.go
// HTTP trigger
func (h *Handler) TriggerAgent(w http.ResponseWriter, r *http.Request) {
var req AgentRequest
json.NewDecoder(r.Body).Decode(&req)
runID := h.agent.Start(r.Context(), req)
json.NewEncoder(w).Encode(map[string]string{"run_id": runID})
}
// Webhook trigger (e.g., Stripe, GitHub)
func (h *Handler) WebhookTrigger(w http.ResponseWriter, r *http.Request) {
event := parseWebhookEvent(r)
req := AgentRequest{
Type: "webhook",
Source: event.Source,
Payload: event.Data,
}
h.agent.Start(r.Context(), req)
}
// Queue trigger (e.g., Redis, SQS)
func (w *Worker) ProcessQueue(ctx context.Context) {
for msg := range w.queue.Receive(ctx) {
req := AgentRequest{
Type: "queue",
Payload: msg.Data,
}
w.agent.Start(ctx, req)
msg.Ack()
}
}
// Scheduled trigger
func (s *Scheduler) RunScheduled(ctx context.Context) {
for _, job := range s.dueJobs() {
req := AgentRequest{
Type: "scheduled",
Schedule: job.Schedule,
Payload: job.Payload,
}
s.agent.Start(ctx, req)
}
}
Forge Opinion:
- Agent logic is trigger-agnostic
- Triggers adapt input to AgentRequest
- Same agent, different entry points
- Log trigger source for debugging
See also: BUILD-TIME-AGENTS.md (MCP Server Opportunity)
Factor 12: Stateless Reducer
"Design agents as pure functions that map (state + event) → new_state."
The Pattern:
Agents should be stateless. All state is external (database, context). Enables testing, scaling, recovery.
Forge Implementation:
// internal/agent/reducer.go
// Agent is a pure function: (state, event) → (new_state, effects)
type AgentFunc func(state AgentState, event Event) (AgentState, []Effect)
// State is loaded from database, not stored in agent
type AgentState struct {
RunID string
CurrentStep int
Context map[string]any
History []Message
}
// Events trigger state transitions
type Event struct {
Type string // "user_input", "tool_result", "approval", "error"
Payload any
}
// Effects are side effects to execute
type Effect struct {
Type string // "call_tool", "send_message", "request_approval"
Payload any
}
// Pure reducer function
func TournamentAgentReducer(state AgentState, event Event) (AgentState, []Effect) {
newState := state // Copy
var effects []Effect
switch event.Type {
case "user_input":
// Add to history, prepare tool call
newState.History = append(newState.History, Message{Role: "user", Content: event.Payload.(string)})
effects = append(effects, Effect{Type: "extract_intent", Payload: event.Payload})
case "tool_result":
// Add result to context, decide next step
result := event.Payload.(ToolResult)
newState.Context[result.ToolName] = result.Data
newState.CurrentStep++
// Decide next action based on state
if shouldContinue(newState) {
effects = append(effects, Effect{Type: "call_tool", Payload: nextTool(newState)})
}
}
return newState, effects
}
// Runner executes effects and recurses
func RunAgent(ctx context.Context, reducer AgentFunc, initialState AgentState, event Event) error {
state := initialState
currentEvent := event
for {
newState, effects := reducer(state, currentEvent)
// Persist state
if err := saveState(ctx, newState); err != nil {
return err
}
// Execute effects
for _, effect := range effects {
result, err := executeEffect(ctx, effect)
if err != nil {
currentEvent = Event{Type: "error", Payload: err}
continue
}
currentEvent = Event{Type: "tool_result", Payload: result}
}
if len(effects) == 0 {
break // No more effects, agent is done
}
state = newState
}
return nil
}
Forge Opinion:
- Agent = reducer function (pure)
- State = database (external)
- Effects = side effects (executed by runner)
- Enables: testing, replay, horizontal scaling, time-travel debugging
Forge + 12-Factor Summary
| Factor | Forge Pattern | Document |
|---|---|---|
| 1. NL → Tool Calls | Structured extraction | This doc |
| 2. Own Your Prompts | Skills framework | BUILD-TIME-AGENTS |
| 3. Own Your Context | Repo Map | BUILD-TIME-AGENTS |
| 4. Tools = Structured Output | Output Validation | RUNTIME-AGENTS |
| 5. Unify State | Single transaction | This doc |
| 6. Launch/Pause/Resume | Checkpoint runner | This doc |
| 7. Contact Humans | Approval tool | This doc |
| 8. Own Control Flow | Direct SDK first | RUNTIME-AGENTS |
| 9. Compact Errors | Error categorization | This doc |
| 10. Small Agents | Agent Routing | BUILD-TIME-AGENTS |
| 11. Trigger Anywhere | MCP Server | BUILD-TIME-AGENTS |
| 12. Stateless Reducer | Reducer pattern | This doc |
Quick Start: Minimal 12-Factor Agent
// The simplest possible 12-factor compliant agent
func MinimalAgent(ctx context.Context, db *sql.DB, ai *ai.Client, input string) error {
// Factor 3: Own your context
ctx = ai.WithContext(ctx, loadRepoMap())
// Factor 1: NL → Tool Call
call, err := ai.ExtractToolCall(ctx, input, availableTools)
if err != nil {
return err
}
// Factor 4: Tools are just structured output
tool, ok := toolRegistry[call.Name]
if !ok {
return fmt.Errorf("unknown tool: %s", call.Name)
}
// Factor 8: Own your control flow
if tool.RequiresApproval {
// Factor 7: Contact humans
if err := requestApproval(ctx, call); err != nil {
return err
}
}
// Factor 5: Unify state (single transaction)
tx, _ := db.BeginTx(ctx, nil)
defer tx.Rollback()
result, err := tool.Handler(ctx, call.Args)
if err != nil {
// Factor 9: Compact errors
return compactError(err)
}
return tx.Commit()
}
Related: Data Architecture Decisions
The 12 factors above focus on agent behavior. For data architecture decisions that enable multi-agent systems, see:
- Data Architecture for Multi-Agent Systems — Unified data layer, real-time over batch, durable memory, co-located security (RLS)
Key alignment:
| 12-Factor Agent | Data Architecture Decision |
|---|---|
| Factor 5: Unify State | Unified data layer (single Postgres) |
| Factor 6: Launch/Pause/Resume | Durable agent memory (checkpoints in DB) |
| Factor 12: Stateless Reducer | Agent state in database, not memory |