Bazure

Azure Live Computer Vision with AI voice feedback

Go
Go
Node.js
Express
Azure Cognitive Services
Azure Speech

Bazure captures your screen in a loop, sends the image to Azure's Computer Vision API, and speaks the analysis back to you through Azure's neural TTS voices. You can also talk to it and ask questions about what's on screen using speech transcription.

It runs three modules concurrently: barosa-screen-capture handles window screenshots using maim and xdotool on Linux, barosa-azure sends captures to Azure's vision API and parses the results (detected objects, OCR text, scene descriptions), and barosa-microphone manages speech input/output through Azure Speech Services.

The backend is Go (good fit for the concurrent capture/API/audio loops running as goroutines) and the frontend is an Express.js dashboard for configuration and status.

Round-trip latency is about 1.5-2 seconds after optimizations. The system skips API calls when the screen hasn't changed much (pixel diff threshold) and captures asynchronously so it doesn't block on analysis.

View on GitHub