Skip to content

DigvijayIngole55/Specta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Specta - Voice Gaming Assistant

A voice-activated gaming assistant that captures screenshots and provides real-time AI-powered analysis using Gemini Vision.

Features

  • Voice Activation: Wake word detection using "Hey Specta"
  • Screenshot Analysis: Automatic screenshot capture and AI analysis
  • Real-time Conversation: Natural voice conversations with context retention
  • Gaming-Focused: Specialized prompts for gaming assistance
  • Professional Audio: High-quality speech-to-text and text-to-speech

Architecture

                           SPECTA VOICE GAMING ASSISTANT
                                System Architecture

┌─────────────┐    ┌─────────────┐    ┌──────────────────────────────┐
│ Microphone  │───▶│ Wake Word   │───▶│        PIPECAT PIPELINE      │
│   Audio     │    │ Detection   │    │                              │
└─────────────┘    │ (Picovoice) │    │ ┌─────────────────────────┐  │
                   └─────────────┘    │ │   LocalAudioTransport   │  │
                                      │ │    + Silero VAD         │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │     STT Mute Filter     │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │   Whisper STT Service   │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
┌─────────────┐                       │ │   First Query Handler   │  │
│ Screenshot  │◀──────────────────────│ │  • Screenshot Capture   │  │
│  Storage    │                       │ │  • Gemini Vision API    │  │
└─────────────┘                       │ │  • Response Parsing     │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │ Context Aggregator      │  │
                                      │ │     (User)              │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │  Gemini LLM Service     │  │
                                      │ │  (Follow-up queries)    │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │   Deepgram TTS          │  │
                                      │ └──────────┬──────────────┘  │
                                      │            │                 │
                                      │ ┌──────────▼──────────────┐  │
                                      │ │    Audio Output         │  │
┌─────────────┐                       │ └──────────┬──────────────┘  │
│  Speakers/  │◀──────────────────────│            │                 │
│ Headphones  │                       │ ┌──────────▼──────────────┐  │
└─────────────┘                       │ │ Context Aggregator      │  │
                                      │ │    (Assistant)          │  │
                                      │ └─────────────────────────┘  │
                                      └──────────────────────────────┘

Flow: Audio → "Hey Specta" → STT → Screenshot+Vision → Context → LLM → TTS → Audio

Key Components:

  • Wake word detection (Picovoice)
  • Speech-to-Text (Whisper)
  • Screenshot capture (PIL)
  • AI analysis (Gemini 2.5 Flash)
  • Text-to-Speech (Deepgram)
  • Context management (OpenAI-compatible)

Requirements

  • Python 3.8+
  • API Keys:
    • GEMINI_API_KEY - Google Gemini AI
    • DEEPGRAM_API_KEY - Deepgram speech services
    • PICOVOICE_ACCESS_KEY - Picovoice wake word detection

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/Specta.git
cd Specta
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
cp .env.example .env
# Edit .env with your API keys

Usage

  1. Run the assistant:
python specta.py
  1. Say "Hey Specta" to activate
  2. Ask gaming-related questions
  3. The assistant will capture screenshots and provide contextual help

Configuration

  • Wake Word: Customizable via hey_specta.ppn file
  • Screenshots: Saved to screenshots/ directory
  • Audio: 16kHz sample rate with VAD

Dependencies

  • pipecat-ai - Voice pipeline framework
  • google-generativeai - Gemini AI integration
  • pvporcupine - Wake word detection
  • deepgram-sdk - Speech services
  • whisper - Speech-to-text
  • PIL - Screenshot capture
  • sounddevice - Audio I/O

License

MIT License - see LICENSE file for details.

Contributing

Pull requests welcome! Please ensure:

  • Clean, professional code
  • No hardcoded secrets
  • Proper error handling
  • Documentation updates

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages