DeepSeek ran a 284-billion-parameter model on a laptop. A year ago that took a rack of GPUs. Local LLM inference got that good, and this video shows how it happened and how to copy it.
We break down where quantization and faster runtimes changed the math, what a laptop can handle, and how to run capable models yourself with Ollama and LM Studio instead of paying per token. You'll also see where local AI still falls short, so you know when the cloud is the right call.
If you build with AI and want more of it running on your own hardware, this is the setup to steal.
#LocalLLM #DeepSeek #AI
Chapters:
0:00 Intro
0:19 What actually shipped
1:40 The 76 gigabyte trick
3:07 Why DS4 is deliberately narrow
4:28 The verdict that actually matters
5:23 Where the narrative breaks down
6:53 What this actually means and what to watch
Tools & resources mentioned:
DS4 (DwarfStar 4), antirez's inference engine: https://github.com/antirez/ds4
DeepSeek V4 Flash (model weights & info): https://github.com/antirez/ds4
antirez blog post on DS4: https://antirez.com/news/165
llama.cpp (Georgi Gerganov): https://github.com/ggerganov/llama.cpp
About The Stack
The Stack is a channel for people building with AI. Every video is a short, illustrated breakdown of the tools, models, and patterns that actually ship: Claude Code and AI coding tools, AI agents and orchestration, open-source AI tools and repos worth knowing, RAG and vector search, local and open-weight LLMs, and the model and cost calls that matter. Opinionated, evidence-led, no fluff.
New breakdowns regularly. Subscribe so you catch them: / @the-stack-ai
#LocalAI #DeepSeek #LLMonLaptop #OpenSource #AIInference