Specta - Voice Gaming Assistant

A voice-activated gaming assistant that captures screenshots and provides real-time AI-powered analysis using Gemini Vision.

Features

Voice Activation: Wake word detection using "Hey Specta"
Screenshot Analysis: Automatic screenshot capture and AI analysis
Real-time Conversation: Natural voice conversations with context retention
Gaming-Focused: Specialized prompts for gaming assistance
Professional Audio: High-quality speech-to-text and text-to-speech

Architecture

                           SPECTA VOICE GAMING ASSISTANT
                                System Architecture

┌─────────────┐    ┌─────────────┐    ┌──────────────────────────────┐
│ Microphone  │───▶│ Wake Word   │───▶│        PIPECAT PIPELINE      │
│   Audio     │    │ Detection   │    │                              │
└─────────────┘    │ (Picovoice) │    │ ┌─────────────────────────┐  │
                   └─────────────┘    │ │   LocalAudioTransport   │  │
                                      │ │    + Silero VAD         │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │     STT Mute Filter     │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │   Whisper STT Service   │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
┌─────────────┐                       │ │   First Query Handler   │  │
│ Screenshot  │◀──────────────────────│ │  • Screenshot Capture   │  │
│  Storage    │                       │ │  • Gemini Vision API    │  │
└─────────────┘                       │ │  • Response Parsing     │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │ Context Aggregator      │  │
                                      │ │     (User)              │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │  Gemini LLM Service     │  │
                                      │ │  (Follow-up queries)    │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │   Deepgram TTS          │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │    Audio Output         │  │
┌─────────────┐                       │ └──────────┬──────────────┘  │
│  Speakers/  │◀──────────────────────│            │                 │
│ Headphones  │                       │ ┌──────────▼──────────────┐  │
└─────────────┘                       │ │ Context Aggregator      │  │
                                      │ │    (Assistant)          │  │
                                      │ └─────────────────────────┘  │
                                      └──────────────────────────────┘

Flow: Audio → "Hey Specta" → STT → Screenshot+Vision → Context → LLM → TTS → Audio

Key Components:

Wake word detection (Picovoice)
Speech-to-Text (Whisper)
Screenshot capture (PIL)
AI analysis (Gemini 2.5 Flash)
Text-to-Speech (Deepgram)
Context management (OpenAI-compatible)

Requirements

Python 3.8+
API Keys:
- GEMINI_API_KEY - Google Gemini AI
- DEEPGRAM_API_KEY - Deepgram speech services
- PICOVOICE_ACCESS_KEY - Picovoice wake word detection

Installation

Clone the repository:

git clone https://github.com/yourusername/Specta.git
cd Specta

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

cp .env.example .env
# Edit .env with your API keys

Usage

Run the assistant:

python specta.py

Say "Hey Specta" to activate
Ask gaming-related questions
The assistant will capture screenshots and provide contextual help

Configuration

Wake Word: Customizable via hey_specta.ppn file
Screenshots: Saved to screenshots/ directory
Audio: 16kHz sample rate with VAD

Dependencies

pipecat-ai - Voice pipeline framework
google-generativeai - Gemini AI integration
pvporcupine - Wake word detection
deepgram-sdk - Speech services
whisper - Speech-to-text
PIL - Screenshot capture
sounddevice - Audio I/O

License

MIT License - see LICENSE file for details.

Contributing

Pull requests welcome! Please ensure:

Clean, professional code
No hardcoded secrets
Proper error handling
Documentation updates

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
hey_response.wav		hey_response.wav
hey_specta.ppn		hey_specta.ppn
requirements.txt		requirements.txt
specta.py		specta.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Specta - Voice Gaming Assistant

Features

Architecture

Requirements

Installation

Usage

Configuration

Dependencies

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

DigvijayIngole55/Specta

Folders and files

Latest commit

History

Repository files navigation

Specta - Voice Gaming Assistant

Features

Architecture

Requirements

Installation

Usage

Configuration

Dependencies

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages