I built JARVIS — a fully local, GPU-accelerated AI voice assistant that runs entirely on my PC with zero cloud dependency. This 3-minute demo shows wake word detection, voice commands, live web research, document generation, and desktop control — all powered by a local LLM running on an AMD GPU.
Everything runs on-device: speech recognition, language model, text-to-speech. No subscriptions, no API costs for core functionality, no data leaving your machine.
== TIMESTAMPS ==
0:01 - Wake word & basic voice commands
0:11 - Real-time web research
1:09 - Document generation (PPTX/DOCX/PDF)
1:42 - Desktop & application control
2:06 - Multi-skill demo
2:35 - News headline rundown & wrap-up
== TECH STACK ==
• LLM: Qwen3-VL-8B (Q5_K_M, local via llama.cpp + ROCm)
• STT: Fine-tuned Whisper (CTranslate2, Southern US accent)
• TTS: Kokoro 82M (CPU inference)
• GPU: AMD RX 7900 XT (20GB VRAM) with ROCm
• Wake Word: Porcupine
• Intent Matching: Semantic (sentence-transformers)
• 11 active skills
== LINKS ==
GitHub: https://github.com/InterGenJLU/jarvis
Setup Guide: https://github.com/InterGenJLU/jarvis...
== WHAT MAKES THIS DIFFERENT ==
Most "AI assistants" are thin wrappers around cloud APIs. JARVIS runs a full 8B-parameter vision-language model locally on consumer AMD hardware. Speech recognition is fine-tuned for my accent. The entire pipeline — wake word, STT, LLM reasoning, tool use, TTS — executes on-device with sub-second latency.
Built for privacy-first personal productivity. No telemetry, no cloud accounts required.
#LocalAI #OpenSource #VoiceAssistant