Building an AI assistant from scratch
An assistant that listens and uses your webcam to see what's around.
I built an AI assistant that hears and sees.
I used Python to build it. 169 lines of code and four main components:
Gemini Flash is the LLM that I used. It’s much faster than GPT-4o, cheaper, and it’s concise in its answers. For some reason, GPT-4o is very chatty, and no amount of prompting helped me fix that.
Whisper is the transcription model to turn aud…
Keep reading with a 7-day free trial
Subscribe to Underfitted to keep reading this post and get 7 days of free access to the full post archives.