Roadmap

One vertical slice at a time — fully working before the next begins

Implementation Phases

Phase 0 — Foundation
Added: llama.cpp server, N8N orchestration, Redis + PostgreSQL + Qdrant memory stack, audit log system, secondary safety review model.
Unlocks: All infrastructure in place. The plumbing is real — ready to receive the first reasoning loop and specialist.
Phase 1 — MVP: Base Jarvis
Added: Text + voice input, HomeAssistant specialist (ask + act), conversational memory buffer, trigger context store, basic automatic triggers.
Unlocks: End-to-end validation of the full architecture. Jarvis is live — limited intelligence, complete architecture. Every subsequent phase adds specialists on top of this validated base.
Phase 2 — Search & Knowledge
Added: Internet search specialist, NAS/archive retrieval specialist, Qdrant semantic memory layer, Mistral Embed for indexing.
Unlocks: Jarvis can answer questions from both the open internet and the user's personal knowledge base. Multi-source synthesis — weather + HomeAssistant sensor, docs + infrastructure config.
Phase 3 — Media & Daily Services
Added: Jellyfin specialist (movies, TV), Navidrome specialist (music), audiobook reader specialist, self-developed custom tools.
Unlocks: Jarvis controls the full media ecosystem and personal tooling. Daily quality-of-life interactions become fast and natural.
Phase 4 — Productivity
Added: OnlyOffice specialist (document creation and editing), VSCode integration (local + web instance).
Unlocks: Jarvis assists directly with document work and code editing — reading, writing, and navigating files in the user's working environment.
Phase 5 — SysAdmin + Audio
Added: Homelab monitoring specialist, sandboxed code execution, VM-tested configuration management. Continuous raw audio processing layer (tone, speaker identification, source separation).
Unlocks: Jarvis can assist with real homelab administration under the full safety stack. Continuous audio context enrichment begins — Jarvis understands the home's sound environment.
Phase 6 — Personal Life Planning
Added: Calendar specialist, grocery list specialist, personal finance tracking, weekly planning synthesis.
Unlocks: Jarvis has a full model of the user's schedule, routines, and preferences. Proactive planning suggestions — not just reactive answers.
Phase 7+ — Advanced Inputs
Added: Photo input (Pixtral vision model), gesture input (camera + skeleton layer), video input (camera + audio pipeline combined).
Unlocks: Multimodal interaction — Jarvis can reason about photos, respond to gestures, and understand the home environment through video context. One modality at a time.
Long Term — Self-Expansion
Added: Specialist-creator agent (Devstral) with full safety validation ladder in place.
Unlocks: Jarvis can propose and draft new specialists autonomously. Human validation always required before any new specialist goes live. The system becomes incrementally self-extending.

Phase Gate Criteria

A phase is complete when all conditions are met — not before

All specialists in the phase have passed the activation checklist — including deliberate break-testing of every ask and act call, confirmation that the secondary review model correctly gates all irreversible actions, and a full audit log review showing no unexpected side effects.

The role has been used in real daily life for a sustained period without critical failures. Lab testing is necessary but not sufficient — real-world use surfaces edge cases that testing never reaches.

Documentation is up to date and the specialist registry reflects the current state. A phase is not complete if the documentation doesn't match what was actually built — the registry is the source of truth for the reasoning loop.

While a phase is in daily use, the following can run in parallel without violating the iterative principle: infrastructure improvements, manifest refinement, drafting (but not activating) the next phase's specialists, and security or monitoring improvements.

Back to Home