Project Doc

12-Factor Agents

12 principles for production-ready AI agents

Updated: December 2025 Source: 12-FACTOR-AGENTS.md

12-Factor Agents

Status: Draft Purpose: Principles for building production-ready AI agents, adapted for Forge Source: HumanLayer 12-Factor Agents


Why 12 Factors?

The original 12-Factor App gave us a shared language for building reliable web services. The 12-Factor Agents framework does the same for AI agents.

Key insight from HumanLayer:

"Most AI agents that actually succeed in production aren't magical autonomous beings at all – they're mostly well-engineered traditional software, with LLM capabilities carefully sprinkled in at key points."

This aligns perfectly with Forge's thesis: primitives over frameworks, deterministic code with strategic AI.


The 12 Factors (Quick Reference)

┌─────────────────────────────────────────────────────────────────┐
│  THE 12 FACTORS FOR PRODUCTION AI AGENTS                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  INPUTS                                                          │
│  1. Natural Language → Tool Calls    (structured extraction)    │
│  2. Own Your Prompts                 (no framework magic)       │
│  3. Own Your Context Window          (active management)        │
│                                                                  │
│  EXECUTION                                                       │
│  4. Tools Are Just Structured Output (JSON + code, demystified) │
│  5. Unify Execution + Business State (single source of truth)   │
│  6. Launch/Pause/Resume              (checkpoints, recovery)    │
│                                                                  │
│  CONTROL                                                         │
│  7. Contact Humans with Tool Calls   (approval as a tool)       │
│  8. Own Your Control Flow            (you decide, not the LLM)  │
│  9. Compact Errors into Context      (learn from failures)      │
│                                                                  │
│  ARCHITECTURE                                                    │
│  10. Small, Focused Agents           (<100 tools, <20 steps)    │
│  11. Trigger from Anywhere           (webhooks, APIs, queues)   │
│  12. Stateless Reducer               ((state, event) → state)   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Factor 1: Natural Language → Tool Calls

"The core LLM superpower is converting natural language to structured data."

The Pattern:

User intent (natural language) → Structured tool call (JSON) → Deterministic execution (code)

Forge Implementation:

// internal/ai/extraction.go
type ToolCall struct {
    Name   string          `json:"name"`
    Args   json.RawMessage `json:"args"`
}

func ExtractToolCall(ctx context.Context, client *ai.Client, userInput string, availableTools []Tool) (*ToolCall, error) {
    prompt := buildExtractionPrompt(userInput, availableTools)

    response, err := client.Complete(ctx, prompt, ai.Options{
        ResponseFormat: ai.JSONSchema(ToolCallSchema),
    })
    if err != nil {
        return nil, err
    }

    var call ToolCall
    if err := json.Unmarshal([]byte(response.Text), &call); err != nil {
        return nil, fmt.Errorf("invalid tool call: %w", err)
    }

    return &call, nil
}

Forge Opinion:

  • Use structured output (JSON mode) for all tool extraction
  • Validate against schema before execution
  • The LLM's job is classification, not execution

See also: AI-INTEGRATION-LEVELS.md (Level 2: A2UI pattern)


Factor 2: Own Your Prompts

"Production quality requires hand-crafted prompts, not abstractions."

The Pattern:

Version prompts like code. Test them. Don't hide them behind framework abstractions.

Forge Implementation:

internal/ai/prompts/
├── extraction.go        # Tool call extraction prompts
├── summarization.go     # Content summarization prompts
├── validation.go        # Output validation prompts
└── prompts_test.go      # Prompt regression tests
// internal/ai/prompts/extraction.go
const ToolExtractionPrompt = `You are a tool router. Given the user's request and available tools,
output a JSON object with the tool name and arguments.

Available tools:
{{range .Tools}}
- {{.Name}}: {{.Description}}
  Args: {{.ArgsSchema}}
{{end}}

User request: {{.UserInput}}

Output format: {"name": "tool_name", "args": {...}}

Rules:
- Select exactly one tool
- If no tool matches, use "unknown"
- Never invent tools not in the list
`

Forge Opinion:

  • Prompts live in code, not config files
  • Version control all prompts
  • Test prompts with golden examples
  • No prompt abstraction layers (LangChain templates)

See also: BUILD-TIME-AGENTS.md (Pattern 1: Skills Framework)


Factor 3: Own Your Context Window

"Don't blindly append; actively manage what the LLM sees."

The Pattern:

Context windows have a "dumb zone" (middle 40-60%) where recall degrades. Actively curate what goes in.

Forge Implementation:

// internal/ai/context.go
type ContextManager struct {
    maxTokens     int
    reservedStart int  // Keep first N tokens (system prompt)
    reservedEnd   int  // Keep last N tokens (recent messages)
}

func (cm *ContextManager) Build(systemPrompt string, history []Message, currentInput string) string {
    // 1. System prompt (always first, protected)
    ctx := systemPrompt

    // 2. Summarize old history if needed
    if cm.tokenCount(history) > cm.maxTokens/2 {
        history = cm.summarizeOldMessages(history)
    }

    // 3. Recent history (protected, last N messages)
    recentCount := min(5, len(history))
    recent := history[len(history)-recentCount:]

    // 4. Current input (always included)
    ctx += formatMessages(recent) + currentInput

    return ctx
}

func (cm *ContextManager) summarizeOldMessages(msgs []Message) []Message {
    // Compress old messages into summary
    // Keep recent messages intact
}

Forge Opinion:

  • Never exceed 40% of context window with history
  • System prompt: first 10%
  • Recent context: last 30%
  • Middle: summarized or omitted
  • Use repo maps to provide structure without bloat

See also: BUILD-TIME-AGENTS.md (Pattern 7: Repo Map)


Factor 4: Tools Are Just Structured Output

"Demystify tool use as simple routing."

The Pattern:

A "tool call" is just JSON extraction + a switch statement. No magic.

Forge Implementation:

// internal/ai/tools/router.go
type Tool struct {
    Name        string
    Description string
    ArgsSchema  json.RawMessage
    Handler     func(ctx context.Context, args json.RawMessage) (any, error)
}

var registry = map[string]Tool{
    "get_tournament": {
        Name:        "get_tournament",
        Description: "Retrieve tournament details by ID",
        ArgsSchema:  []byte(`{"type":"object","properties":{"id":{"type":"string"}}}`),
        Handler:     getTournamentHandler,
    },
    "list_matches": {
        Name:        "list_matches",
        Description: "List matches for a tournament",
        ArgsSchema:  []byte(`{"type":"object","properties":{"tournament_id":{"type":"string"}}}`),
        Handler:     listMatchesHandler,
    },
}

func ExecuteTool(ctx context.Context, call ToolCall) (any, error) {
    tool, ok := registry[call.Name]
    if !ok {
        return nil, fmt.Errorf("unknown tool: %s", call.Name)
    }

    // Validate args against schema
    if err := validateJSON(call.Args, tool.ArgsSchema); err != nil {
        return nil, fmt.Errorf("invalid args: %w", err)
    }

    return tool.Handler(ctx, call.Args)
}

Forge Opinion:

  • Tools are registered in code, not discovered magically
  • Schema validation before execution
  • Handlers are regular Go functions
  • No special "tool frameworks"

See also: RUNTIME-AGENTS.md (Pattern 5: Output Validation)


Factor 5: Unify Execution + Business State

"Keep the agent's decision-making state synchronized with your application's state."

The Pattern:

Agent state and business state have different lifecycles but must stay consistent.

Forge Implementation:

// internal/agent/state.go
type AgentRun struct {
    // Execution state (agent lifecycle)
    RunID       string          `json:"run_id"`
    Status      RunStatus       `json:"status"`  // pending, running, paused, completed, failed
    StartedAt   time.Time       `json:"started_at"`
    Steps       []Step          `json:"steps"`
    CurrentStep int             `json:"current_step"`

    // Business state (domain lifecycle)
    TournamentID string         `json:"tournament_id,omitempty"`
    MatchID      string         `json:"match_id,omitempty"`
    Context      map[string]any `json:"context"`
}

// Persist both together
func (r *AgentRun) Save(ctx context.Context, db *sql.DB) error {
    // Single transaction for both states
    tx, _ := db.BeginTx(ctx, nil)
    defer tx.Rollback()

    // Save execution state
    if err := r.saveExecutionState(tx); err != nil {
        return err
    }

    // Save business state changes
    if err := r.saveBusinessState(tx); err != nil {
        return err
    }

    return tx.Commit()
}

Forge Opinion:

  • Single transaction for state updates
  • Execution state in agent_runs table
  • Business state in domain tables
  • Both updated atomically

Factor 6: Launch/Pause/Resume

"Design agents to start, pause, and resume execution cleanly."

The Pattern:

Agents should checkpoint their state and resume from any step.

Forge Implementation:

// internal/agent/runner.go
type Runner struct {
    db     *sql.DB
    ai     *ai.Client
    tools  map[string]Tool
}

func (r *Runner) Run(ctx context.Context, runID string) error {
    run, err := r.loadRun(ctx, runID)
    if err != nil {
        return err
    }

    for run.CurrentStep < len(run.Steps) {
        // Check for pause request
        if r.shouldPause(ctx, runID) {
            run.Status = StatusPaused
            return run.Save(ctx, r.db)
        }

        step := run.Steps[run.CurrentStep]

        // Execute step
        result, err := r.executeStep(ctx, run, step)
        if err != nil {
            run.Status = StatusFailed
            run.Steps[run.CurrentStep].Error = err.Error()
            return run.Save(ctx, r.db)
        }

        // Checkpoint after each step
        run.Steps[run.CurrentStep].Result = result
        run.CurrentStep++
        if err := run.Save(ctx, r.db); err != nil {
            return err
        }
    }

    run.Status = StatusCompleted
    return run.Save(ctx, r.db)
}

func (r *Runner) Resume(ctx context.Context, runID string) error {
    // Simply call Run - it picks up from CurrentStep
    return r.Run(ctx, runID)
}

Forge Opinion:

  • Checkpoint after every step
  • Resume = load state + continue loop
  • Pause = set flag + save state
  • Recovery = resume from last checkpoint

Factor 7: Contact Humans with Tool Calls

"Treat human approval as just another tool type."

The Pattern:

When the agent needs human input, it calls a "human" tool and waits.

Forge Implementation:

// internal/agent/tools/human.go
var HumanApprovalTool = Tool{
    Name:        "request_human_approval",
    Description: "Request human approval before proceeding with a sensitive action",
    ArgsSchema:  []byte(`{"type":"object","properties":{"action":{"type":"string"},"reason":{"type":"string"}}}`),
    Handler:     requestHumanApproval,
}

func requestHumanApproval(ctx context.Context, args json.RawMessage) (any, error) {
    var req struct {
        Action string `json:"action"`
        Reason string `json:"reason"`
    }
    json.Unmarshal(args, &req)

    // Create approval request
    approval := &ApprovalRequest{
        ID:        uuid.New().String(),
        RunID:     getRunID(ctx),
        Action:    req.Action,
        Reason:    req.Reason,
        Status:    "pending",
        CreatedAt: time.Now(),
    }

    // Save and notify
    if err := approval.Save(ctx); err != nil {
        return nil, err
    }

    // Return special response that pauses the run
    return PauseForApproval{ApprovalID: approval.ID}, nil
}

Forge Opinion:

  • Human approval is a first-class tool
  • Agent pauses when approval requested
  • Approval via API, Slack, email, etc.
  • Resume when approved/rejected

See also: BUILD-TIME-AGENTS.md (Pattern 10: Assumption Checkpoint)


Factor 8: Own Your Control Flow

"Implement your own orchestration logic rather than relying entirely on LLM decision-making."

The Pattern:

The agent loop is YOUR code. LLM is called at decision points, not in charge.

Forge Implementation:

// internal/agent/flow.go
func ProcessTournamentRequest(ctx context.Context, input string) error {
    // 1. Extract intent (LLM)
    intent, err := extractIntent(ctx, input)
    if err != nil {
        return err
    }

    // 2. Route based on intent (YOUR CODE)
    switch intent.Type {
    case "create_tournament":
        return handleCreateTournament(ctx, intent.Args)
    case "update_bracket":
        return handleUpdateBracket(ctx, intent.Args)
    case "generate_schedule":
        // This one uses more AI
        return handleGenerateSchedule(ctx, intent.Args)
    default:
        return handleUnknownIntent(ctx, input)
    }
}

func handleGenerateSchedule(ctx context.Context, args map[string]any) error {
    // Mix of deterministic and AI steps

    // 1. Load constraints (deterministic)
    constraints := loadSchedulingConstraints(ctx, args["tournament_id"])

    // 2. Generate candidate schedule (AI)
    schedule, err := generateScheduleWithAI(ctx, constraints)
    if err != nil {
        return err
    }

    // 3. Validate against rules (deterministic)
    if err := validateSchedule(schedule, constraints); err != nil {
        // Retry with feedback
        return generateScheduleWithAI(ctx, constraints, err.Error())
    }

    // 4. Save (deterministic)
    return saveSchedule(ctx, schedule)
}

Forge Opinion:

  • You write the loop, not LangGraph
  • LLM for: intent extraction, content generation, complex reasoning
  • Code for: routing, validation, persistence, error handling
  • Clear boundaries between AI and deterministic logic

See also: RUNTIME-AGENTS.md (AI Tooling Decisions)


Factor 9: Compact Errors into Context

"Efficiently represent error information for the LLM to process."

The Pattern:

When something fails, give the LLM concise, actionable feedback.

Forge Implementation:

// internal/ai/errors.go
type CompactError struct {
    Step      string `json:"step"`
    Error     string `json:"error"`
    Hint      string `json:"hint"`
    Retryable bool   `json:"retryable"`
}

func CompactifyError(step string, err error) CompactError {
    // Convert verbose errors to concise feedback
    switch {
    case errors.Is(err, sql.ErrNoRows):
        return CompactError{
            Step:      step,
            Error:     "not_found",
            Hint:      "The requested resource does not exist",
            Retryable: false,
        }
    case errors.Is(err, context.DeadlineExceeded):
        return CompactError{
            Step:      step,
            Error:     "timeout",
            Hint:      "Operation took too long, try with smaller scope",
            Retryable: true,
        }
    default:
        return CompactError{
            Step:      step,
            Error:     "unknown",
            Hint:      truncate(err.Error(), 100),
            Retryable: true,
        }
    }
}

// Include in retry prompt
func BuildRetryPrompt(original string, compactErr CompactError) string {
    return fmt.Sprintf(`Previous attempt failed:
Step: %s
Error: %s
Hint: %s

Please try again with this feedback in mind.

Original request: %s`,
        compactErr.Step, compactErr.Error, compactErr.Hint, original)
}

Forge Opinion:

  • Never dump full stack traces to LLM
  • Categorize errors (not_found, timeout, validation, etc.)
  • Include actionable hints
  • Mark retryable vs terminal failures

Factor 10: Small, Focused Agents

"Build specialized agents for narrow tasks rather than monolithic general-purpose ones."

The Pattern:

<100 tools, <20 steps. Compose multiple focused agents for complex workflows.

Forge Implementation:

// internal/agent/registry.go

// BAD: One agent with 50 tools
var MonolithicAgent = Agent{
    Name:  "tournament_agent",
    Tools: allTournamentTools, // 50+ tools
}

// GOOD: Focused agents with clear responsibilities
var (
    SchedulingAgent = Agent{
        Name:        "scheduling",
        Description: "Generates and optimizes tournament schedules",
        Tools:       []Tool{generateSchedule, optimizeSchedule, checkConflicts},
        MaxSteps:    10,
    }

    BracketAgent = Agent{
        Name:        "bracket",
        Description: "Manages bracket creation and seeding",
        Tools:       []Tool{createBracket, seedTeams, advanceWinner},
        MaxSteps:    8,
    }

    NotificationAgent = Agent{
        Name:        "notification",
        Description: "Sends notifications to participants",
        Tools:       []Tool{notifyTeam, notifyAll, scheduleReminder},
        MaxSteps:    5,
    }
)

// Orchestrator composes focused agents
func ProcessComplexRequest(ctx context.Context, input string) error {
    // Route to appropriate agent
    agent := routeToAgent(input)

    // Or compose multiple agents
    if needsMultipleAgents(input) {
        return runAgentPipeline(ctx, input, []Agent{
            SchedulingAgent,
            NotificationAgent,
        })
    }

    return agent.Run(ctx, input)
}

Forge Opinion:

  • Each agent: <10 tools, <10 steps
  • Clear, single responsibility
  • Compose via orchestrator (your code)
  • Name agents by capability, not technology

See also: BUILD-TIME-AGENTS.md (Pattern 2: Agent Routing)


Factor 11: Trigger from Anywhere

"Design agents to activate from multiple entry points and channels."

The Pattern:

Same agent logic, multiple triggers: HTTP, webhook, queue, schedule, CLI.

Forge Implementation:

// internal/agent/triggers.go

// HTTP trigger
func (h *Handler) TriggerAgent(w http.ResponseWriter, r *http.Request) {
    var req AgentRequest
    json.NewDecoder(r.Body).Decode(&req)

    runID := h.agent.Start(r.Context(), req)
    json.NewEncoder(w).Encode(map[string]string{"run_id": runID})
}

// Webhook trigger (e.g., Stripe, GitHub)
func (h *Handler) WebhookTrigger(w http.ResponseWriter, r *http.Request) {
    event := parseWebhookEvent(r)

    req := AgentRequest{
        Type:    "webhook",
        Source:  event.Source,
        Payload: event.Data,
    }

    h.agent.Start(r.Context(), req)
}

// Queue trigger (e.g., Redis, SQS)
func (w *Worker) ProcessQueue(ctx context.Context) {
    for msg := range w.queue.Receive(ctx) {
        req := AgentRequest{
            Type:    "queue",
            Payload: msg.Data,
        }
        w.agent.Start(ctx, req)
        msg.Ack()
    }
}

// Scheduled trigger
func (s *Scheduler) RunScheduled(ctx context.Context) {
    for _, job := range s.dueJobs() {
        req := AgentRequest{
            Type:     "scheduled",
            Schedule: job.Schedule,
            Payload:  job.Payload,
        }
        s.agent.Start(ctx, req)
    }
}

Forge Opinion:

  • Agent logic is trigger-agnostic
  • Triggers adapt input to AgentRequest
  • Same agent, different entry points
  • Log trigger source for debugging

See also: BUILD-TIME-AGENTS.md (MCP Server Opportunity)


Factor 12: Stateless Reducer

"Design agents as pure functions that map (state + event) → new_state."

The Pattern:

Agents should be stateless. All state is external (database, context). Enables testing, scaling, recovery.

Forge Implementation:

// internal/agent/reducer.go

// Agent is a pure function: (state, event) → (new_state, effects)
type AgentFunc func(state AgentState, event Event) (AgentState, []Effect)

// State is loaded from database, not stored in agent
type AgentState struct {
    RunID       string
    CurrentStep int
    Context     map[string]any
    History     []Message
}

// Events trigger state transitions
type Event struct {
    Type    string // "user_input", "tool_result", "approval", "error"
    Payload any
}

// Effects are side effects to execute
type Effect struct {
    Type    string // "call_tool", "send_message", "request_approval"
    Payload any
}

// Pure reducer function
func TournamentAgentReducer(state AgentState, event Event) (AgentState, []Effect) {
    newState := state // Copy
    var effects []Effect

    switch event.Type {
    case "user_input":
        // Add to history, prepare tool call
        newState.History = append(newState.History, Message{Role: "user", Content: event.Payload.(string)})
        effects = append(effects, Effect{Type: "extract_intent", Payload: event.Payload})

    case "tool_result":
        // Add result to context, decide next step
        result := event.Payload.(ToolResult)
        newState.Context[result.ToolName] = result.Data
        newState.CurrentStep++

        // Decide next action based on state
        if shouldContinue(newState) {
            effects = append(effects, Effect{Type: "call_tool", Payload: nextTool(newState)})
        }
    }

    return newState, effects
}

// Runner executes effects and recurses
func RunAgent(ctx context.Context, reducer AgentFunc, initialState AgentState, event Event) error {
    state := initialState
    currentEvent := event

    for {
        newState, effects := reducer(state, currentEvent)

        // Persist state
        if err := saveState(ctx, newState); err != nil {
            return err
        }

        // Execute effects
        for _, effect := range effects {
            result, err := executeEffect(ctx, effect)
            if err != nil {
                currentEvent = Event{Type: "error", Payload: err}
                continue
            }
            currentEvent = Event{Type: "tool_result", Payload: result}
        }

        if len(effects) == 0 {
            break // No more effects, agent is done
        }

        state = newState
    }

    return nil
}

Forge Opinion:

  • Agent = reducer function (pure)
  • State = database (external)
  • Effects = side effects (executed by runner)
  • Enables: testing, replay, horizontal scaling, time-travel debugging

Forge + 12-Factor Summary

Factor Forge Pattern Document
1. NL → Tool Calls Structured extraction This doc
2. Own Your Prompts Skills framework BUILD-TIME-AGENTS
3. Own Your Context Repo Map BUILD-TIME-AGENTS
4. Tools = Structured Output Output Validation RUNTIME-AGENTS
5. Unify State Single transaction This doc
6. Launch/Pause/Resume Checkpoint runner This doc
7. Contact Humans Approval tool This doc
8. Own Control Flow Direct SDK first RUNTIME-AGENTS
9. Compact Errors Error categorization This doc
10. Small Agents Agent Routing BUILD-TIME-AGENTS
11. Trigger Anywhere MCP Server BUILD-TIME-AGENTS
12. Stateless Reducer Reducer pattern This doc

Quick Start: Minimal 12-Factor Agent

// The simplest possible 12-factor compliant agent

func MinimalAgent(ctx context.Context, db *sql.DB, ai *ai.Client, input string) error {
    // Factor 3: Own your context
    ctx = ai.WithContext(ctx, loadRepoMap())

    // Factor 1: NL → Tool Call
    call, err := ai.ExtractToolCall(ctx, input, availableTools)
    if err != nil {
        return err
    }

    // Factor 4: Tools are just structured output
    tool, ok := toolRegistry[call.Name]
    if !ok {
        return fmt.Errorf("unknown tool: %s", call.Name)
    }

    // Factor 8: Own your control flow
    if tool.RequiresApproval {
        // Factor 7: Contact humans
        if err := requestApproval(ctx, call); err != nil {
            return err
        }
    }

    // Factor 5: Unify state (single transaction)
    tx, _ := db.BeginTx(ctx, nil)
    defer tx.Rollback()

    result, err := tool.Handler(ctx, call.Args)
    if err != nil {
        // Factor 9: Compact errors
        return compactError(err)
    }

    return tx.Commit()
}

Related: Data Architecture Decisions

The 12 factors above focus on agent behavior. For data architecture decisions that enable multi-agent systems, see:

Key alignment:

12-Factor Agent Data Architecture Decision
Factor 5: Unify State Unified data layer (single Postgres)
Factor 6: Launch/Pause/Resume Durable agent memory (checkpoints in DB)
Factor 12: Stateless Reducer Agent state in database, not memory

References