Analysis and Corpus
OpenMontage builds and queries a local, CLIP-indexed corpus of video and image assets for reference-driven productions. The corpus draws from free and open archives such as Archive.org, NASA, and Wikimedia Commons, plus optional stock sources when API keys are present. This enables documentary-style workflows that rely on real footage instead of generated video.
Corpus Layout
Each production that uses analysis maintains its own corpus under the project workspace:
projects/<kebab-name>/corpus/
├── clips/ # downloaded video and image files
├── thumbnails/
│ └── <clip_id>/
│ └── frame_00.jpg # evenly spaced frames for preview
├── embeddings.npy # (N, 512) L2-normalised visual vectors
├── tag_embeddings.npy # (N, 512) L2-normalised text/tag vectors
└── index.jsonl # one metadata record per clip
The index.jsonl and NumPy files stay aligned row-by-row. The entire projects/ tree is gitignored.
Core Analysis Tools
Four tools perform the heavy lifting. All run locally and require no paid API keys for basic operation:
video_analyzeraccepts a URL or local file and returns avideo_analysis_brief. Depths aretranscript_only,standard, anddeep. It chains metadata fetch, transcript extraction, scene detection, keyframe sampling, motion classification, and style profiling.scene_detectwraps PySceneDetect (content and threshold methods) plus FFmpeg to emit start/end timestamps for each scene.frame_samplerextracts JPEG frames by count, explicit timestamps, or scene boundaries.transcriberproduces word-level segments using WhisperX. Speaker diarization is available when anHF_TOKENis configured.
Supporting tools (video_downloader, transcript_fetcher, audio_probe) are invoked automatically by the analyzer.
Reference-Driven Workflows
Pipelines that declare reference_input.supported: true (such as cinematic) activate analysis when a reference video is supplied at the start of a session. The agent:
- Runs the analysis tools listed in the pipeline manifest.
- Produces a
video_analysis_briefcontaining source metadata, content analysis, structure (scenes and pacing_profile), style profile, narration transcript, keyframes, and replication guidance. - Embeds the reference and user query through CLIP.
- Retrieves matching clips from the corpus using fused visual + tag similarity.
- Diversifies the selected set and builds
edit_decisionsthat respect the reference's motion and timing characteristics.
Sub-stages (for example, a short sample preview) become active once the video_analysis_brief exists.
Request real-footage behavior explicitly: "use real footage only." The system then avoids paid video generation and falls back to stock or archive clips.
Retrieval and Diversity
The corpus supports:
- Ranking by text query (visual and tag channels blended).
- Nearest-neighbour search from a seed clip.
- Maximal Marginal Relevance diversification to avoid repetitive cuts.
Motion score, duration, and shot-type filters can be applied during retrieval. Results feed directly into the edit stage.
Verification Before Use
Run preflight to confirm the local analysis stack and any configured stock providers are ready:
make preflight
Or inspect the live registry:
python -c "
from tools.tool_registry import registry
import json
registry.discover()
print(json.dumps(registry.provider_menu_summary(), indent=2))
See guides/project-workspace for the full directory contract and guides/running-pipelines for how to start a reference-aware session. For the list of pipelines that support reference input, see reference/available-pipelines.