Building My Own Jarvis

Sep 28, 2025

What AI Really Looks Like Beyond the Hype.

When most people think of AI, they think about AI-generated memes, fake selfies with world leaders and celebrities using a certain “drawf banana” image model, or “vibe coding”, where someone pastes a prompt and magically gets an entire web app they have no idea how to run.

That’s the surface. The deeper story is different.

LLMs today are nowhere close to AGI, but models like GPT-5, Qwen2.5, and Gemini 2.5 Pro already demonstrate sophisticated reasoning capabilities. They can review your LinkedIn profile and highlight gaps. They can refine your resume so it gets through an ATS filter. And, as I’ve discovered firsthand, they can even guide you step by step through building an offline Jarvis-like personal assistant on a Raspberry Pi, from a simple shopping list to fully working code.

From Shopping List to Jarvis

This wasn’t about cloning Tony Stark. It was about testing what’s possible today with budget hardware, patience, and step-by-step AI assistance. With ChatGPT as a guide, I went from shopping list to a functioning voice-based AI assistant in days.

ChatGPT built me a MicroCenter shopping list that stayed within my budget and constraints (no bread boards or soldering irons).
It walked me through installing the OS on an NVMe drive, expanding partitions, and setting up a headless boot.
It gave me the core pipeline:

🎙 Record voice → 
📝 Transcribe with Whisper (STT) → 
🧠 Local LLM via llama.cpp → 
🔊 Convert back with Piper (TTS) → 
📢 Play through speaker

And when errors popped up, it helped debug step by step.

Debugging With AI as my Copilot

One of the most striking things was how specific the guidance got when errors happened.

Example 1 — Mic setup
When my first attempt to record audio failed:

arecord -D plughw:1,0 -f cd test.wav
# → Error: No such file or directory

ChatGPT suggested:

“Run arecord -l to list capture devices. It looks like your mic isn’t at card 1, try hw:2,0.”

Sure enough, arecord -D hw:2,0 -f S16_LE -c1 -r16000 test.wav worked.

Example 2 — Model merge
When downloading the split Qwen model files, I wasn’t sure how to combine them.
ChatGPT immediately pointed out the right tool:

/srv/llm/llama.cpp/build/bin/llama-gguf-split \
  --merge qwen2.5-7b-instruct-q5_k_m-00001-of-00002.gguf \
  qwen2.5-7b-instruct-q5_k_m.gguf

Example 3 — Systemd service
I wanted my LLM server to start automatically. My first attempt failed with some cryptic stoi errors.
ChatGPT spotted that my ExecStart was passing placeholders ({MODEL}, {CTX}) instead of actual values. Rewriting with environment variables fixed it:

[Service]
Environment="MODEL=/srv/llm/models/qwen2.5-7b-instruct-q5_k_m.gguf"
Environment="THREADS=4"
ExecStart=/srv/llm/llama.cpp/build/bin/llama-server \
  -m ${MODEL} -c 4096 -t ${THREADS} -ngl 0 -a 0.0.0.0 -p 8080

Suddenly, the service came up, and my local LLM was reachable at:

http://jarvispi.local:8080

Example 4 — Clipped audio
At first, Piper’s speech output kept chopping off the first word.
Instead of shrugging, ChatGPT had me experiment systematically:

Add prepended silence (400ms)
Prime the ALSA device with a short burst (0.3s)
Insert a tiny delay (0.2s) before playback

# Python snippet
subprocess.run(["aplay", "-D", SPEAKER_DEV, "/dev/zero", "-d", "0.30"])
time.sleep(0.2)

The clipping vanished. Every word came through cleanly and clearly.

Before this project, I had no background with ALSA, Qwen, llama.cpp, Whisper, or Piper. What I did bring was basic Raspberry Pi experience and working knowledge of Python and shell scripting. With ChatGPT as my co-pilot, I invested a few hundred dollars in hardware and followed the process step by step.

The result: not only did the project work, but I now have a decent grasp of the fundamentals in a domain I had never touched before. More importantly, the experience left me curious to explore further, without the frustration that usually comes with tackling a completely new field.

Why This Matters for Business

For executives and entrepreneurs, the point isn’t “how to build Jarvis” It’s this:

AI has evolved from answering trivia to hands-on reasoning, debugging configs, solving dependency errors, optimizing processes.
A single person can now stitch together what used to take a small engineering team.
Tools like GPT-5 aren’t just idea generators; they’re project copilots, capable of helping you build, refine, and document an entire system.

Closing Thoughts

GPT-5 or Gemni 2.5 Pro or Qwen 2.5 aren’t AGI, but they are powerful enough that you can:

build your own Jarvis-like device,
launch a one-person enterprise, or
spend your time making celebrity memes.

The choice is yours.

Curious about the details? Check out the full code and documentation here:
GitHub – voicegpt-pi

The Glass Box

Discussion about this post

Ready for more?