Self-Improving Agent System for Production Deployment
Multi-agent pipeline with a feedback flywheel that continuously improves from deployment outcomes.
Architecture · Dashboard · Getting Started · Packages · Roadmap
Forge is an open-source, self-improving agent system that deploys production software. It combines a multi-agent pipeline (Planner → Coder ⇄ Reviewer → Deployer → Verifier) with a Feedback Flywheel that continuously improves every component based on real deployment outcomes.
Inspired by Factory.ai but built with a fundamentally different architecture: Forge closes the feedback loop. Every deployment outcome feeds back to improve agent prompts, routing weights, and validation criteria — turning one-shot deployments into a compounding quality flywheel.
| Dimension | Factory.ai | Forge |
|---|---|---|
| Pipeline | Fixed linear pipeline | DAG-based with conditional routing |
| Learning | Static prompts, manual tuning | Automatic prompt/routing improvement from feedback |
| Validation | Pre-defined rules | Evolving validation criteria shaped by outcomes |
| Data store | Proprietary | SpacetimeDB (real-time, native Rust+TS SDKs) |
| Architecture | Monolithic | Turborepo monorepo, Rust orchestrator, TS runtime |
| Durability | Standard execution | Workflow SDK step/replay for crash recovery |
Forge is structured as a layered system with six distinct layers, each responsible for a clear concern:
┌─────────────────────────────────────────────────────────┐
│ Web Dashboard (Next.js) │
│ Real-time status, feedback viz │
├─────────────────────────────────────────────────────────┤
│ Feedback Flywheel │
│ Outcome analysis → Prompt tuning → Weight adjustment │
├─────────────────────────────────────────────────────────┤
│ Pipeline Engine (DAG) │
│ Planner → Coder ⇄ Reviewer → Deployer → Verifier │
├─────────────────────────────────────────────────────────┤
│ Agent Pool │
│ Versioned prompts, tool sets, model bindings, profiles │
├─────────────────────────────────────────────────────────┤
│ Substrate (Runtime + Routing) │
│ Model Router │ Tool Executor │ SpacetimeDB │ gRPC │
├─────────────────────────────────────────────────────────┤
│ Orchestrator (Rust) │
│ Container lifecycle via gRPC │
├─────────────────────────────────────────────────────────┤
│ Deployment Targets (Plugins) │
│ rust-service │ python-api │ docker │ k8s │
└─────────────────────────────────────────────────────────┘
User Request
│
▼
┌──────────┐
│ Planner │ → Decomposes request into tasks, selects agents, builds execution plan
└────┬─────┘
│
▼
┌──────────┐ ┌───────────┐
│ Coder │ ←→ │ Reviewer │ Loop up to 3× if review fails
└────┬─────┘ └─────┬─────┘
│ pass │ pass
▼ │
┌──────────┐ │
│ Deployer │ ←──────────┘
└────┬─────┘
│
▼
┌──────────┐
│ Verifier │ → Health checks, smoke tests, rollback on failure
└────┬─────┘
│
▼
┌──────────┐
│ Feedback │ → All outcomes recorded, flywheel activated
│ Store │
└──────────┘
forge/
├── packages/
│ ├── runtime/ # TypeScript — Agent execution, pipeline DAG, model router, tools, feedback
│ ├── orchestrator/ # Rust — Container lifecycle management via gRPC
│ ├── cli/ # TypeScript — forge CLI (run, review, deploy, status, init)
│ ├── targets/ # TypeScript — Deploy target plugins (rust-service, python-api)
│ └── web/ # TypeScript — Next.js 16 dashboard (planned)
├── docs/
│ └── images/ # Screenshots and diagrams
├── Cargo.toml # Rust workspace root
├── forge.yaml.example # Project configuration template
├── turbo.json # Turborepo pipeline config
├── docker-compose.yaml # Local dev environment (SpacetimeDB, orchestrator, runtime)
├── package.json # Root package.json (workspace)
├── tsconfig.base.json # Shared TypeScript config
├── SCOPE.md # Detailed scope and design document
└── README.md # This file
The Forge web dashboard provides real-time visibility into every aspect of the agent system. Built with Next.js 16, Tailwind CSS v4, shadcn/ui, and Recharts — featuring a dark theme with emerald primary and amber accent colors.
The main dashboard shows KPI cards for total pipeline runs, success rate, active agents, and average latency. Below, a bar chart tracks daily pipeline run volume and an area chart shows latency trends over the past 7 days. Agent health progress bars and a recent runs table complete the view.
An interactive SVG DAG visualization shows the pipeline flow from Planner through the Coder ⇄ Reviewer loop to Deployer and Verifier, with animated flow lines and a dashed feedback loop path. A searchable run history table shows every pipeline execution with status, latency, and timing.
Individual agent cards display per-agent metrics including success rate progress bars, total runs, average latency, and token consumption (in/out). Each agent type has a distinct color: Planner (cyan), Coder (emerald), Reviewer (violet), Deployer (amber), Verifier (rose).
The feedback flywheel visualization includes a grouped bar chart comparing Anthropic vs OpenAI routing weights across task types, a success rate trend area chart, routing details table, and a recent feedback feed with color-coded outcome indicators.
Three deployment target type cards (Rust Service, Python API, Docker) are shown with KPI summaries. A deployment history table tracks every deployment with status, commit SHA, and timing. A rollback alert card highlights recent rollbacks with reasons.
| Component | Technology | Purpose |
|---|---|---|
| Runtime | TypeScript 5.7, Node.js 22 | Agent execution, pipeline engine, model routing |
| Orchestrator | Rust (tokio, tonic 0.12, prost 0.13) | Container lifecycle via gRPC |
| CLI | TypeScript, Commander.js | User-facing command-line interface |
| Deploy Targets | TypeScript, plugin system | Pluggable deployment backends |
| Web Dashboard | Next.js 16, Tailwind CSS v4, shadcn/ui, Recharts | Real-time monitoring and control |
| State/Feedback | SpacetimeDB (planned) | Real-time persistence with Rust+TS SDKs |
| Container Orch | Docker SDK (planned) | Build, push, and manage containers |
| Pipeline | Custom DAG engine | Topological sort, conditional edges |
| Durability | Workflow SDK (planned) | Step/replay for crash recovery |
| Build System | Turborepo, tsup | Monorepo build orchestration |
| Config | YAML + Zod validation | Project configuration |
The core engine. Contains the agent base class, all five agent implementations (Planner, Coder, Reviewer, Deployer, Verifier), the DAG-based pipeline engine with conditional routing, the model router with weighted selection and self-tuning, a plugin-based tool executor, and the in-memory feedback store.
Key exports:
PipelineEngine— DAG-based pipeline execution with Coder ⇄ Reviewer loopModelRouter— Multi-provider routing (Anthropic, OpenAI) with per-task weightsBaseAgent→PlannerAgent,CoderAgent,ReviewerAgent,DeployerAgent,VerifierAgentToolExecutorImpl— Built-in tools:file_read,file_write,shell_exec,search,http_checkFeedbackStore— In-memory feedback collection (SpacetimeDB-ready interface)DurablePipeline— Workflow SDK wrapper for crash-resilient execution (scaffold)loadForgeConfig— YAML config loader with Zod validation
The Rust orchestrator provides container lifecycle management through a gRPC service. It defines a ContainerManager trait for abstraction, ships with an in-memory mock implementation, and exposes 8 RPCs: CreateContainer, ExecCommand, StreamCommand, DestroyContainer, GetContainerStatus, BuildImage, PushImage, and GetResourceUsage.
Key components:
OrchestratorService<M: ContainerManager>— Generic gRPC service implementationContainerManagertrait — Pluggable backend (InMemory, Docker planned)proto/orchestrator.proto— Full gRPC service definition with streaming supporterror.rs— Typed error enum with gRPC status code mapping
Command-line interface built with Commander.js. Provides forge init (project scaffolding — functional), forge run (pipeline execution — partially functional), and placeholder commands for review, deploy, and status.
Commands:
forge init— Creates project structure withforge.yamlconfig ✅forge run— Executes the full pipeline with model routing⚠️ forge review— Standalone code review (placeholder)forge deploy— Deploy to target (placeholder)forge status— Show project status (placeholder)
Plugin-based deployment target system. Ships with two target plugins and a registry pattern for extensibility.
Included targets:
RustServiceTarget— Cargo build → Dockerfile generation → Docker image buildPythonApiTarget— pip install → pytest → Dockerfile generation → Docker image build
Both targets implement the DeployTarget interface with validate(), build(), deploy(), rollback(), and healthCheck() methods. Currently, deploy() only builds Docker images (registry push is stubbed), and rollback() returns a placeholder success.
- Node.js ≥ 22.0.0
- Bun ≥ 1.2.0 (package manager)
- Rust ≥ 1.75 (for orchestrator)
- Docker (for deployment targets)
# Clone the repository
git clone https://github.com/icohangar-ops/forge.git
cd forge
# Install dependencies
bun install
# Build all packages
bun run build
# Build the Rust orchestrator
cargo build --release -p forge-orchestrator# Initialize a new Forge project
bun run --filter @forge/cli dev -- init my-project
# Run the pipeline
cd my-project
bun run --filter @forge/cli dev -- run "Add rate limiting to the API gateway"Create a forge.yaml in your project root (or use forge init to generate one):
name: my-project
language: rust
agents:
planner:
model: claude-sonnet-4-20250514
max_tokens: 4096
temperature: 0.2
coder:
model: claude-sonnet-4-20250514
max_tokens: 8192
temperature: 0.2
reviewer:
model: gpt-4o
max_tokens: 4096
temperature: 0.1
max_review_rounds: 3
deployer:
model: claude-sonnet-4-20250514
max_tokens: 4096
temperature: 0.1
verifier:
model: claude-sonnet-4-20250514
max_tokens: 4096
temperature: 0.1
deploy:
target: rust-service
config:
registry: ghcr.io
image_prefix: myorg/
runtime:
max_pipeline_duration_ms: 600000 # 10 minutes
max_agent_tokens: 16384
max_shell_commands: 50
allowed_shell_commands:
- cargo
- rustc
- docker
- kubectl
- npm
- bun
- python3
- git| Variable | Description | Default |
|---|---|---|
ANTHROPIC_API_KEY |
Anthropic API key for Claude models | — |
OPENAI_API_KEY |
OpenAI API key for GPT models | — |
FORGE_NO_DURABLE |
Set to 1 to skip Workflow SDK durability |
— |
RUST_LOG |
Rust logging level for orchestrator | info |
- Type system — Comprehensive TypeScript types for the entire system
- Agent framework — BaseAgent with tool-use loop, all 5 agent implementations
- Pipeline engine — DAG execution with conditional routing and Coder ⇄ Reviewer loop
- Model router — Weighted multi-provider routing with self-tuning from feedback
- Config system — YAML + Zod validation with
forge initscaffolding - Tool executor — Plugin architecture with file, shell, search, and HTTP tools
- Feedback store — In-memory collection with aggregation (DB-swap ready interface)
- Rust orchestrator — gRPC service with in-memory mock container manager
- CLI —
forge initandforge runcommands - Deploy targets — Rust service and Python API plugins (build phase)
- Web dashboard — 5-tab monitoring dashboard with charts, DAG, and tables
- SpacetimeDB integration — Not yet implemented; feedback store is in-memory only
- Docker container manager — Only in-memory mock exists; no real container orchestration
- Workflow SDK durability — DurablePipeline is a scaffold; the
workflowpackage needs real implementation - CLI commands —
review,deploy, andstatusare placeholders - Security — Shell command allowlist needs hardening against injection attacks
- No tests — Test infrastructure is configured but no test files exist yet
- No CI/CD — No GitHub Actions or automated workflows
- Architecture & scope document
- Agent base class + all 5 agents
- Model Router (multi-provider, per-task, self-tuning weights)
- Pipeline Engine (DAG execution, conditional edges, Coder ⇄ Reviewer loop)
- Tool executor (file_read, file_write, shell_exec, search, http_check)
- Feedback store (in-memory, SpacetimeDB-ready interface)
- Config loader (forge.yaml with Zod validation)
- Workflow SDK integration scaffold (durable pipeline with step/replay)
- CLI scaffold (forge init, run, review, deploy, status)
- Deploy target plugins (rust-service, python-api)
- Rust orchestrator (gRPC, container trait, in-memory impl)
- Web dashboard (Next.js 16, 5 tabs, SVG DAG, charts)
- Docker
ContainerManagerimplementation for the orchestrator - SpacetimeDB module with schema and reducers
- Real deploy + registry push in target plugins
- Automatic rollback on verification failure
- Functional
forge deployandforge statusCLI commands
- SpacetimeDB reducer for routing weight auto-tuning
- Real-time dashboard subscriptions via SpacetimeDB
- Deployment history and outcome tracking
- Alerting on success rate degradation
- Prompt versioning and A/B testing
- Automatic prompt tuning from feedback patterns
- Agent prompt promotion/demotion based on outcomes
- Quality score dashboards
- Custom agent registration (user-defined agents)
- Parallel agent execution in pipeline
- Kubernetes-native deployment target
- SaaS multi-tenancy
- Team collaboration features
# Build all packages
bun run build
# Development mode (watch)
bun run dev
# Lint all packages
bun run lint
# Run tests
bun run test
# Clean all build artifacts
bun run clean# Debug build
cargo build -p forge-orchestrator
# Release build
cargo build --release -p forge-orchestrator
# Run the gRPC server
cargo run -p forge-orchestrator --bin server
# Run with logging
RUST_LOG=debug cargo run -p forge-orchestrator --bin serverThe turbo.json defines the build pipeline with persistent task caching. Each package has its own tsconfig.json extending the base configuration. The Rust workspace is configured in the root Cargo.toml.
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Commit your changes (
git commit -m 'Add my feature') - Push to the branch (
git push origin feature/my-feature) - Open a Pull Request
Please read SCOPE.md for the full architectural vision and design decisions before contributing.
Forge's multi-agent pipeline architecture maps directly to the MAPS framework (Multi-Agent Pipeline Skills) for structured agent system development.
| MAPS Phase | Forge Component |
|---|---|
| M0 Foundation | forge.yaml project config — intent, runtime constraints, model bindings |
| M1 System Shape | Multi-Agent track — Planner → Coder ⇄ Reviewer → Deployer → Verifier |
| M2 Roster | 5 specialized agents defined in @forge/runtime |
| M3 Contracts | DAG pipeline edges, Coder ⇄ Reviewer loop contract (max 3 rounds) |
| M4 Coordination | DAG-based conditional routing via topological sort |
| M5 Agent Buildout | Each agent extends BaseAgent with versioned prompts, tools, model bindings |
| M6 Capabilities | Tool executor plugin system (file, shell, search, HTTP) |
| M7 Orchestration | PipelineEngine DAG execution with conditional edges |
| M8 Experience | Next.js 16 dashboard (5 tabs: Overview, Pipelines, Agents, Feedback, Deployments) |
| M9 Evaluate | Verifier agent health checks + smoke tests |
| M10 Deploy | Deploy target plugins (Rust Service, Python API, Docker) |
| M11 Improve | Feedback Flywheel — outcome analysis → prompt tuning → weight adjustment |
Each Forge agent follows the MAPS APS lifecycle:
A1 Define ─▶ A2 Design ─▶ A3 Build ─▶ A4 Equip ─▶ A5 Evaluate ─▶ A6 Deploy ─▶ A7 Observe ─▶ A8 Improve
- Define (A1): Agent brief — role, model binding, temperature, max tokens in
forge.yaml - Design (A2): Agent interface via
BaseAgent— tool set, prompt template, routing profile - Build (A3): Implementation in
@forge/runtimewith typed inputs/outputs - Equip (A4): Tool executor assignment, model router weights, capability map
- Evaluate (A5): Reviewer agent with Coder ⇄ Reviewer loop (max 3 rounds), Verifier smoke tests
- Deploy (A6): Deploy target plugin execution (Docker image build, registry push)
- Observe (A7): Dashboard monitoring — per-agent success rate, latency, token consumption
- Improve (A8): Feedback Flywheel — routing weight auto-tuning, prompt versioning from outcomes
| Skill | Use Case |
|---|---|
/foundation |
Initialize Forge project with MAPS M0 preflight |
/shape |
Validate Multi-Agent track decision |
/define-agent |
Brief new custom agents for Phase 5 (user-defined agents) |
/build-agent++ |
Incremental agent development with TDD for new agent types |
/equip-agent |
Capability mapping for tool permissions and model bindings |
/evaluate-agent++ |
LangSmith/Phoenix tracing for agent eval suites |
/observe-agent |
Dashboard + trace integration for production monitoring |
/improve-agent |
Improvement backlog driven by flywheel outcomes |
MIT
Forge draws architectural inspiration from Factory.ai's pioneering work on AI-powered software deployment. We extend their vision by making the system self-improving through a closed feedback loop — every deployment makes the next one better.





