Observability

Dark Factory captures structured logs, trace IDs, per-step artifacts, and historical analytics for every run. This page covers the full telemetry surface — from raw log files to the web dashboard — so you can debug failed issues, audit agent decisions, and track cost trends over time.

Logs

Structured JSON per run and issue

Traces

End-to-end trace IDs per issue

Analytics

SQLite database for cost and trends

Dashboard

TUI and web UI for live monitoring

Run Directories

Every run writes its output to a timestamped directory under ~/.godark/runs/. This is the canonical source for all per-run data.

~/.godark/runs/
  <owner>/
    <repo>/
      <YYYYMMDD-HHMMSS>/
        run.json                  # Run metadata, dependencies, rate-limit state
        debug.log                 # Run-level structured log
        waves/
          1.json, 2.json, ...     # Per-wave execution records
        issues/
          <N>/
            debug.log             # Issue-level log
            recon.json            # Step artifacts (see Artifacts Reference)
            planner.json
            implement.json
            dialogue.json
            failure-analysis.json
            ...

Note: there is currently no log rotation or retention policy. Runs accumulate without bound — see Known Gaps.

Logs

Logs are written at two levels — one debug.log for the entire run, and one per issue. Both use structured JSON via Go's slog.JSONHandler, with colored text also streamed to stdout.

Log Locations

Scope	Path
Run-level	`~/.godark/runs/<owner>/<repo>/<timestamp>/debug.log`
Issue-level	`.../issues/<N>/debug.log`

Logger Variants

Full — writes to both file and stdout with ANSI colors
File-only — structured JSON to disk, no terminal output
Append — appends to an existing log (used for retries)

Format

Each log line is a JSON object with timestamp, level, message, and structured fields. Terminal output adds ANSI highlighting for verdicts and timeouts.

{"time":"2026-04-15T14:32:01Z","level":"INFO","msg":"step completed","step":"implement","issue":42,"cost_usd":0.18,"duration_s":34.2}

Tracing

Every issue gets a UUID trace ID generated at the start of the agent loop. That ID propagates through every step result, verification, and outcome record, making it possible to follow a single issue's entire decision path.

Propagation Path

Agent Loop StepResult VerifyStepResult Outcome SQLite

Querying Traces

The godark trace CLI supports three output modes:

# Text summary (default)
godark trace 42

# Full JSON output
godark trace 42 --json

# Detailed step-by-step view
godark trace 42 --detail

# Query by trace ID directly
godark trace abc12def-...

Surface Points

TUI — shows the first 8 characters of the trace ID on issue completion
Dashboard — renders the full trace ID in the issue detail header with a copy button
SQLite — trace_id columns on both step_results and issue_outcomes

Analytics Database

The analytics database at ~/.godark/stats.db is a SQLite file created at orchestrator startup. It stores run metadata, per-issue outcomes, and per-step cost/duration records. Stats are written post-run and failures are non-fatal — a broken analytics write never corrupts a run.

`runs`

Run metadata and aggregate outcome counts

idrepomilestonestarted_atfinished_attotal_issuesimplementedfailed

`issue_outcomes`

Per-issue final status, error messages, and tracing

run_idissue_numberstatuserrorpr_numbertrace_idclone_sha

`step_results`

Per-step cost, duration, resource usage, and prompt capture

issue_numberstepcost_usdduration_secondsstarted_atfinished_atpeak_memory_bytescpu_nanosecondstrace_idprompt

Cost & Duration Tracking

Per-step: cost_usd, duration_seconds, started_at, finished_at
Resources: peak_memory_bytes, cpu_nanoseconds (Phase 30)
Aggregation: Aggregate() computes cost totals, duration stats, flag frequencies, retry stats
Trends: ComputeTrends() produces historical line-chart data
Per-issue cost: IssueCostUSD() sums all step cost fields

Outcome Values

implemented ready-to-merge needs-human-review failed

CLI Commands

Four commands cover the main observability use cases. See CLI Reference for full flag documentation.

godark trace

Follow the decision flow for a single issue or trace ID. Supports text, JSON, and detail output modes.

godark trace 42 --detail

godark analyze

Aggregate report across runs — outcome distribution, flag frequencies, retry stats, cost breakdown, and prompt gap detection.

godark analyze --repo owner/repo --since 2026-01-01

godark report

Sprint summary with optional LLM-generated executive narrative. Output as terminal, markdown, or HTML.

godark report --since 2w --format markdown

godark status

Launch the web dashboard at localhost:8374. Browse runs, issue outcomes, agent logs, and cost data.

godark status --port 9090

TUI & Dashboard

Terminal UI (Bubble Tea)

The TUI renders a live issue table during godark run with spinner-based progress, stage transitions, trace IDs, and per-issue cost/retry counters. Disable with --no-tui for plain-text output.

Web Dashboard

godark status serves an HTML dashboard powered by HTMX for partial page updates. Features include:

Paginated log viewer
Run list with filterable outcomes
Issue detail view with trace ID, dialogue timeline, and step costs
Copy-to-clipboard trace IDs

Progress Events

Both surfaces consume a shared event stream. Current event types:

RunStarted IssueStarted IssueStageChanged IssueCompleted WaveStarted RollupCreated RunFinished JudgeIntervention RateLimited WorkersActive

Artifacts Reference

Each issue directory contains step-level JSON artifacts capturing the full agent output at every stage. Prompts are also stored in step_results.prompt in SQLite (capped at 32 KB).

Step Outputs

recon.json Reconnaissance findings — codebase context gathered before planning

spec-generator.json Generated specification from issue requirements

planner.json Implementation plan with ordered steps and rationale

implement.json Implementation output — files changed, decisions made

quality-review.json Quality review verdict and requested fixes

functional-review.json Functional review against acceptance criteria

merge_coordinator.json Merge coordination — conflict resolution, rollup decisions

risk-assessment.json Risk classification for auto-merge gating

verify-N.json Verification step results (one per retry cycle)

Analysis Artifacts

dialogue.json Agent dialogue timeline — role, round, body, outcome per entry

failure-analysis.json Failure patterns — codes, counts, severity classification

judge-interventions.json Per-rule judge intervention records

container-log.txt Raw container stdout/stderr capture

punchlist.json Verification steps, scenario cases, acceptance tests, changed files

spec-delta.json Before/after specs — added, removed, and changed cases

Wave Tracking

Wave execution records are stored in waves/<N>.json — one file per dependency-resolution wave, capturing which issues ran in parallel and their outcomes.

Failure Analysis

When issues fail, Dark Factory captures detailed failure data to help you understand what went wrong and where.

Failure Artifacts

Source	What It Captures
`failure-analysis.json`	Pattern codes, occurrence counts, severity classification
`Outcome.error`	Final error message — searchable in SQLite via `issue_outcomes`
`judge-interventions.json`	Per-rule records of container health judge interventions
`container-log.txt`	Raw container stdout/stderr for build and test failures

Retries

Each retry cycle produces separate step records named retry-N and retry-N-quality-review. These are stored both as JSON files in the issue directory and as rows in step_results.

Resilience

Analytics writes are non-fatal by design. If the stats database is unavailable or a write fails, the run continues normally — run data is never corrupted by an analytics failure.

Known Gaps

The observability surface is mature but has known limitations. These are tracked for future development.

High Priority

P0 No OpenTelemetry / OTLP export — can't plug into external APM tools
P0 No Prometheus /metrics endpoint
P0 No log rotation or retention policy — unbounded disk growth
P0 No real-time cost tracking during an active run
P0 No failure-pattern-aware retry logic

Medium Priority

P1 No error deduplication across runs
P1 No dashboard auto-refresh (requires manual reload)
P1 No per-model cost breakdown
P1 No anomaly detection for cost or duration spikes
P1 No artifact versioning or compression