Screen Vision

Get a guided tour for anything, right on your screen.

How It Works

The system is straightforward:

You describe your goal — "I want to set up two-factor authentication on my Google account" or "Help me configure my Git SSH keys"
You share your screen — The app uses your browser's built-in screen sharing (the same tech used for video calls)
AI analyzes what it sees — Vision language models look at your screen and figure out the current state
You get one instruction at a time — No information overload. Just "Click the blue Settings button in the top right" or "Scroll down to find Security"
Automatic progress detection — When you complete a step, Screen Vision notices the screen changed and automatically gives you the next instruction

Models Used

Model	Provider	Purpose
GPT-5.2	OpenAI	Primary reasoning: generates step-by-step instructions and answers follow-up questions
Gemini 3 Flash	Google AI Studio	Step verification: compares before/after screenshots to confirm action completion
Qwen3-VL 30B	Fireworks AI	Coordinate detection: locates specific UI elements on screen

Privacy & Security

Screen Vision is designed to process your data securely without retaining it.

Zero Data Retention: No images or screen recordings are stored on the server. All processing happens in real-time, and data is discarded immediately after analysis.
Secure AI Processing: Screenshots are sent to trusted LLM providers (OpenAI and Fireworks AI) solely for analysis. These providers adhere to strict data handling policies and do not store or use your data to train their models.
- OpenAI Enterprise Privacy
- Fireworks AI Data Handling Policy

Tech Stack

Frontend: Next.js 13, React 18, Tailwind CSS, Zustand
Backend: FastAPI, Python
AI: OpenAI GPT models, Qwen-VL (via OpenRouter)
UI: Radix primitives, Framer Motion, Lucide icons

Frontend (Next.js + React)

Handles screen capture via the MediaDevices API
Runs change detection by comparing scaled-down frames
Manages the PiP window for always-on-top instructions
Masks its own window from screenshots (so the AI doesn't see itself)

Backend (FastAPI + Python)

/api/step — Given a goal and screenshot, returns the next single instruction
/api/check — Compares before/after screenshots to verify if a step was completed
/api/help — Answers follow-up questions about what's on screen
/api/coordinates — Locates specific UI elements when needed

Self-Hosting

Prerequisites

Node.js 18+
Python 3.10+
pnpm (or npm/yarn)

Installation

Clone the repo and install dependencies:

git clone https://github.com/r-muresan/screen.vision.git
cd screen.vision

# Frontend
pnpm install

# Backend
pip install -r requirements.txt

Configuration

Create a .env.local file in the root directory:

# Required - powers the main step-by-step logic
OPENAI_API_KEY=sk-...

# Required - used for verification and coordinate detection (Qwen models)
OPENROUTER_API_KEY=sk-or-...

The app uses OpenAI for primary reasoning and OpenRouter to access Qwen-VL models for specific tasks like step verification. You can swap these out by modifying api/index.py if you prefer different providers.

Running Locally

Start both the frontend and backend with a single command:

npm run dev

This runs:

Next.js dev server on http://localhost:3000
FastAPI server on http://localhost:8000

Open your browser to http://localhost:3000 and you're good to go.

Running in Production

For production deployments:

# Build the frontend
npm run build

# Start the frontend
npm run start

# Run the API separately
uvicorn api.index:app --host 0.0.0.0 --port 8000

Or use the included Procfile for platforms like Railway or Heroku.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
api		api
app		app
assets		assets
components		components
hooks		hooks
lib		lib
public		public
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
components.json		components.json
demo.gif		demo.gif
next.config.js		next.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.js		postcss.config.js
requirements.txt		requirements.txt
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Screen Vision

Get a guided tour for anything, right on your screen.

How It Works

Models Used

Privacy & Security

Tech Stack

Self-Hosting

Prerequisites

Installation

Configuration

Running Locally

Running in Production

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Screen Vision

Get a guided tour for anything, right on your screen.

How It Works

Models Used

Privacy & Security

Tech Stack

Self-Hosting

Prerequisites

Installation

Configuration

Running Locally

Running in Production

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages