Jarvis — Hardware

Why Threadripper Pro

128 PCIe 4.0 lanes — four AI GPUs at x8 each uses only 32 lanes, leaving ample bandwidth for NVMe SSDs, NIC, HBA, and future expansion. Clean IOMMU topology — each PCIe slot gets its own IOMMU group, enabling clean per-GPU passthrough without ACS override hacks. ECC RAM support — critical for a 24/7 system with persistent memory stores. Silent memory corruption in the wrong key causes subtle misbehaviour that is very hard to trace. High core count — enough to run all LXC containers without CPU contention anywhere in the stack.

Power & Isolation

Dual 1000W rack PSUs in active load-sharing — one PSU failure does not take down the system. Both units run at ~50% load, their most efficient and longest-lasting operating point. All AI inference tasks run exclusively on GPUs — each GPU is passed through to a dedicated LXC instance. Inference workloads are fully isolated from each other and from the orchestration layer. PCIe 4.0 x8 provides ~16 GB/s per slot — more than sufficient for inference workloads where each model loads independently per card with no multi-GPU weight synchronisation.

4U Rack Server — Proxmox LXC allocation

4U Rack Server — AMD Threadripper Pro (Proxmox)
|
+-- LXC: N8N (orchestration)                   [CPU]
+-- LXC: Python reasoning loop                  [CPU]
+-- LXC: Redis + PostgreSQL + Qdrant            [CPU + NVMe]
|
+-- LXC: llama.cpp main reasoning model         [GPU 1 - datacenter, high VRAM]
|         Magistral Small 1.2 / Mistral Large 3
+-- LXC: llama.cpp specialist LLM pool          [GPU 2 - datacenter, high VRAM]
|         Codestral, Devstral, Pixtral, Mistral Small 3.x
+-- LXC: vLLM audio processing                  [GPU 3 - pro-sumer AI]
|         Voxtral Realtime 4B + Voxtral Small 24B + SpeechBrain
+-- LXC: Vision sensor pipelines                [GPU 3 - shared]
|         YOLOv8, MediaPipe, InsightFace (2-5fps, 24/7)
+-- LXC: Safety review model                    [GPU 4 - pro-sumer AI]
|         Mistral Moderation always warm
|
+-- VM:  AI monitoring & audit system
          (isolated, encrypted NAS block, no network path to Jarvis)

Phase 0 — Base System

Added: AMD Ryzen (consumer), RTX 3070 8GB, consumer DDR4/DDR5, consumer NVMe SSD.

Unlocks: Full architecture validation — N8N, reasoning loop, HomeAssistant specialist, memory layers, audit system. Intelligence limited but entire plumbing is real and testable.

Phase 1 — 16GB GPU

Added: RTX 5060 Ti 16GB or RTX 4060 Ti 16GB added as GPU 2.

Unlocks: First real parallelism. 13B class models at Q5/Q8 or small 32B at Q4. Main model quality jumps meaningfully. Specialist responses no longer block the reasoning loop.

Phase 2 — 32GB Pro-sumer GPU

Added: High-VRAM pro-sumer GPU (32GB, e.g. RTX 6000 Ada) added as GPU 3.

Unlocks: 70B class models at Q4 or 32B at Q8. Three-way GPU split — reasoning / specialists / sensors. No GPU contention between layers. Significant intelligence improvement.

Phase 3 — Threadripper Migration

Added: AMD Threadripper Pro CPU + motherboard. Edge node hardware built and tested locally.

Unlocks: Full PCIe lane budget (128 lanes), clean IOMMU topology, rack-ready form factor. Edge nodes validated and ready to deploy — Phase 4 becomes a software integration task.

Phase 4 — Edge Nodes Deployed

Added: Raspberry Pi microphone nodes (per room) + camera nodes (per room) deployed.

Unlocks: Ambient intelligence — Jarvis gains real-time awareness of the home environment. Presence detection, voice recognition, speaker identification, gesture recognition (basic), sensor-based triggers.

Phase 5 — ECC RAM + DB Migration

Added: ECC registered DDR5 RAM. Redis, PostgreSQL, Qdrant move from NAS to local NVMe SSD.

Unlocks: Faster memory reads/writes — noticeably reduces latency on memory-heavy operations like context assembly and episodic retrieval. NAS freed from database duties.

Phase 6 — First Datacenter GPU

Added: Datacenter GPU (NVIDIA A40 48GB, A100 40/80GB, or L40S 48GB) added.

Unlocks: Magistral or Mistral Large at Q6/Q8 or near-unquantised. Reasoning quality approaches the target ceiling. Specialist pool now has two GPUs — multiple specialists loaded simultaneously.

Phase 7 — Pro-sumer AI GPU 2

Added: AMD Radeon AI Pro R9700/R9500 (or equivalent). RTX 3070 decommissioned.

Unlocks: Sensor pipelines (audio + video, 24/7) have dedicated hardware. Safety review model gets a clean resource slice. Stack simplified to 3 active GPUs before the final datacenter addition.

Phase 8 — Full Target Architecture

Added: Second datacenter GPU (same family as Phase 6).

Unlocks: All four GPU roles have dedicated hardware. Zero contention between layers. All primary models resident in VRAM simultaneously — no weight swapping in steady state. Maximum intelligence, maximum parallelism.

+-- GPU 1 -- Datacenter (high VRAM)       -> Main reasoning model
|            NVIDIA A100 / A40 / L40S         Magistral Small 1.2 / Mistral Large 3
|
+-- GPU 2 -- Datacenter (high VRAM)       -> Specialist LLM pool
|            Same family as GPU 1             Codestral, Devstral, Pixtral, Mistral Small 3.x
|                                             Multiple models loaded simultaneously
|
+-- GPU 3 -- Pro-sumer AI                 -> Sensor pipelines (24/7)
|            AMD Radeon AI Pro R9700/R9500    Vision (YOLOv8, MediaPipe, InsightFace)
|                                             Audio (Voxtral Realtime + Voxtral Small)
|
+-- GPU 4 -- Pro-sumer                    -> Safety review model + overflow
             RTX / equivalent                 Mistral Moderation always warm

Hardware Architecture

Why Threadripper Pro

Power & Isolation

Server Layout

4U Rack Server — Proxmox LXC allocation

Acquisition Phases

Final GPU Layout