Architecture

OpenMontage is an agent-orchestrated video production platform. An LLM coding assistant (Claude Code, Cursor, Copilot, etc.) acts as the orchestrator. It reads pipeline manifests, follows skill instructions, calls Python tools, and checkpoints state. There is no runtime Python orchestrator; the agent is the control plane.

High-Level Flow

User gives topic/idea
        |
        v
Agent reads pipeline manifest (YAML)
        |
        v
For each stage:
   1. Agent reads stage-director skill (Markdown)
   2. Agent calls Python tools via tool registry
   3. Agent writes checkpoint (JSON) with artifacts
   4. Agent self-reviews using meta/reviewer skill
   5. Human approval gate (if configured)
        |
        v
Final video output

Agent-First Orchestration

There is no Python orchestrator. The LLM agent:

  • Reads the pipeline manifest to know the stage order
  • Reads each stage-director skill for detailed instructions
  • Calls tools, evaluates results, makes creative decisions
  • Writes checkpoints to persist state between stages

Python provides tools and persistence only. All intelligence lives in skill instructions (Markdown) and pipeline manifests (YAML).

See Tool System for how tools are discovered and invoked. See Pipeline System for manifest structure and stage progression. See Checkpoint System for state persistence details.

Repository Layout

The top-level structure separates concerns between executable capabilities, declarative definitions, and agent instructions:

OpenMontage/
├── lib/                    # Core runtime infrastructure (Python)
├── tools/                  # 57+ Python tool implementations
├── pipeline_defs/          # YAML pipeline manifests
├── schemas/                # JSON Schema definitions for validation
├── skills/                 # Layer 2: OpenMontage-specific agent instructions
├── .agents/skills/         # Layer 3: external technology skills
├── styles/                 # Visual style playbooks (YAML)
├── remotion-composer/      # Node.js/React — Remotion video composition renderer
├── tests/                  # Contract tests, QA integration tests, eval harness
└── docs/                   # Best-practices guides, session handoffs, audits

See 3-Layer Knowledge for how skills, manifests, and tools relate.

Dual-Provider Support

Every capability must support both API providers (cloud, paid) and local/open-source alternatives (free, GPU-dependent). The selector pattern enforces this by routing to whatever is available.

Selectors (tts_selector, image_selector, video_selector) query the live registry at runtime. They rank providers by task fit, quality, control, reliability, cost, latency, and continuity, then adapt input schemas transparently. User preference is respected when explicitly provided.

See Provider Selection for scoring mechanics and Configuring Providers for environment setup.

Core Design Decisions

  1. No runtime orchestrator — The LLM agent reads YAML + Markdown and drives everything. This makes the system debuggable (just read the skill) and model-agnostic.

  2. Checkpoint-based resumption — Any stage can fail and the pipeline resumes from the last checkpoint. No re-running completed stages.

  3. Schema-validated artifacts — Every stage output is validated against a JSON Schema before the checkpoint is written. Prevents garbage propagation.

  4. Budget as a first-class concept — Cost estimation before execution, budget reservation, and reconciliation. The agent cannot silently overspend. See Budget Governance.

  5. Selector pattern over hard-coded providers — Capabilities degrade gracefully. Missing an API key? The selector falls through to the next provider or a local alternative.

  6. Skills over code for intelligence — Creative decisions, quality checklists, review criteria, and prompt templates live in Markdown skills, not Python. This means the agent's behavior can be tuned by editing text files, not code.

Composition runtimes (Remotion, HyperFrames, FFmpeg) are locked at proposal time and never swapped silently. See Composition Runtimes and Style Playbooks for related constraints.