MCP-LOCAL-Reader

AI-Ready Document Converter - Transform any local file into AI-optimized markdown format for seamless integration with Claude Desktop, Claude Code, and other MCP clients.

Intelligent Document Processing - High-performance local file content extraction with advanced parsing for PDF, Office documents, images, and more. Automatically converts complex documents into clean, structured markdown that AI models can easily understand and process.

Features

📄 AI-Optimized File Processing

PDF Documents: Advanced parsing with PyMuPDF4LLM → Clean markdown output
Office Suite: Word, Excel, PowerPoint → Structured tables and text
OpenDocument: ODT, ODS, ODP → Standardized markdown format
Text & Data: Markdown, JSON, CSV, EPUB → Enhanced AI readability
Images: OCR text recognition → Searchable markdown content
Archives: Smart extraction → Organized document collections

🚀 Intelligent Performance

Smart Caching: Remembers processed files for instant re-access
Lazy Loading: Only loads needed components - 80% faster startup
Concurrent Processing: Handles multiple files simultaneously
Resource Optimization: Prevents system overload with smart limits

🔒 Security & Control

Directory Permissions: Restrict access to specific directories
Path Validation: Secure file access with absolute path requirements
File Size Limits: Prevent DoS with configurable size restrictions
Local-First: No data leaves your machine - complete privacy

Quick Start

Prerequisites

Python 3.11+
uv package manager

Installation

Option 1: One-Command Setup (Recommended)

# Clone and auto-configure
git clone https://github.com/freefish1218/mcp-local-reader.git
cd mcp-local-reader
chmod +x install.sh && ./install.sh

The installer will guide you through three installation modes:

Minimal: PDF and basic text files only (smallest footprint)
Standard: Office documents support, no OCR (recommended)
Complete: All features including OCR and archive processing

Option 2: Manual Installation

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Setup project
git clone https://github.com/freefish1218/mcp-local-reader.git
cd mcp-local-reader
uv sync

# Configure environment
cp env.example .env
# Edit .env with your settings

# Start server
./start_mcp.sh

Configuration for Claude Desktop

Automatic Configuration

chmod +x configure_claude.sh && ./configure_claude.sh

Manual Configuration

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or equivalent:

{
  "mcpServers": {
    "local-reader": {
      "command": "/absolute/path/to/mcp-local-reader/start_mcp.sh",
      "args": [],
      "env": {
        "LOCAL_FILE_ALLOWED_DIRECTORIES": "/Users/username/Documents,/Users/username/Downloads"
      }
    }
  }
}

Configuration for Claude Code

Add to .claude/claude_config.json:

{
  "mcpServers": {
    "local-reader": {
      "command": "/absolute/path/to/mcp-local-reader/start_mcp.sh",
      "args": [],
      "env": {
        "LOCAL_FILE_ALLOWED_DIRECTORIES": "/Users/username/Documents,/Users/username/Downloads"
      }
    }
  }
}

Usage

After setup, use these features directly in conversations:

📄 Read & Convert to AI-Ready Markdown

Transform any file into AI-optimized markdown format:

Read the content from /Users/username/Documents/report.pdf
→ Converts to clean markdown with tables, headings, and structure

Parse /Users/username/data.xlsx and show me the data structure  
→ Extracts spreadsheet data as markdown tables

Extract text from /Users/username/presentation.pptx
→ Organizes slides into structured markdown sections

🔄 Save as Markdown Files

Convert and save documents as AI-ready markdown files:

Convert /Users/username/contract.pdf to markdown format
→ Creates contract.pdf.md with structured content

Save /Users/username/analysis.xlsx as markdown in /Users/username/output/
→ Saves formatted tables and data as markdown

Configuration

Essential Settings (.env)

# File access control (REQUIRED)
LOCAL_FILE_ALLOWED_DIRECTORIES=/Users/username/Documents,/Users/username/Downloads

# Performance optimization
TOTAL_CACHE_SIZE_MB=500          # Unified cache limit
CACHE_EXPIRE_DAYS=30             # Cache retention
FILE_READER_MAX_FILE_SIZE_MB=20  # File size limit

# Logging
LOG_LEVEL=INFO

Optional OCR Settings

For image text recognition:

# Vision model for OCR
LLM_VISION_BASE_URL=https://api.openai.com/v1
LLM_VISION_API_KEY=sk-your-api-key-here
LLM_VISION_MODEL=gpt-4o  # or qwen-vl-plus

Environment Variables

Variable	Required	Default	Description
`LOCAL_FILE_ALLOWED_DIRECTORIES`	✅	`current_dir`	Comma-separated allowed directories
`TOTAL_CACHE_SIZE_MB`	❌	`500`	Unified cache size limit
`FILE_READER_MAX_FILE_SIZE_MB`	❌	`20`	Maximum file size
`LOG_LEVEL`	❌	`INFO`	Logging level
`LLM_VISION_API_KEY`	❌	-	OCR vision model API key

MCP Tools

`read_local_file`

Extract content from local files and return as AI-optimized markdown.

Parameter	Type	Description
`file_path`	string	Absolute path to the file
`max_size`	number	File size limit in MB (optional)

`convert_local_file`

Convert files to AI-ready markdown and save to filesystem.

Parameter	Type	Description
`file_path`	string	Absolute path to input file
`output_path`	string	Output path (optional, defaults to input+.md)
`max_size`	number	File size limit in MB (optional)
`overwrite`	boolean	Overwrite existing files (default: false)

Supported File Types

Document Formats

PDF: .pdf
Microsoft Office: .doc, .docx, .ppt, .pptx, .xls, .xlsx
OpenDocument: .odt, .ods, .odp
Text: .txt, .md, .rtf, .csv, .json, .xml

Image Formats (with OCR)

Common: .png, .jpg, .jpeg, .gif, .bmp, .tiff
Advanced: .webp, .svg

Archive Formats

Compressed: .zip, .tar, .tar.gz, .7z
Office: .docx, .xlsx, .pptx (internally zip-based)

Special Formats

E-books: .epub
Data: .csv, .tsv, .json

Architecture

Core Components

FileReader (src/file_reader/core.py): Main orchestrator for file content extraction
MCP Server (src/mcp_server.py): FastMCP-based server providing MCP tools
Parser System (src/file_reader/parsers/): Specialized parsers for different file types
Cache Manager (src/file_reader/cache_manager.py): Unified caching system
Storage Layer (src/file_reader/storage/): Secure local file access

Performance Optimizations

Unified Caching: Single cache instance instead of multiple (reduced from ~6GB to 500MB default)
Lazy Loading: Parsers loaded on-demand, not at startup
Dependency Optimization: Optional dependencies for advanced features
Resource Limits: Configurable memory and file size limits

Development

Setup Development Environment

git clone https://github.com/freefish1218/mcp-local-reader.git
cd mcp-local-reader
uv sync
source .venv/bin/activate  # On Unix/macOS

Running Tests

# Run all tests
uv run python tests/run_tests.py

# Specific test categories
uv run python tests/run_tests.py --models     # Data models
uv run python tests/run_tests.py --parsers    # File parsers
uv run python tests/run_tests.py --core       # Core functionality
uv run python tests/run_tests.py --server     # MCP server

# With coverage
uv run python tests/run_tests.py -c

# Alternative pytest usage
PYTHONPATH=src uv run pytest tests/ -v

Adding New Parsers

Create parser in src/file_reader/parsers/
Inherit from BaseParser
Register in parser_loader.py
Add tests in tests/test_parsers.py

See CONTRIBUTING.md for detailed development guidelines.

Performance Characteristics

Smart Caching: Instantly access previously processed files without re-conversion
Efficient Memory Use: Optimized from 6GB+ to 500MB default cache size
Lightning Startup: 80% faster startup with on-demand component loading
Parallel Processing: Handle multiple document conversions simultaneously

System Requirements

Python: 3.11+
OS: macOS, Linux, Windows
Memory: 2GB+ recommended for large files
Optional: LibreOffice (legacy Office files), Pandoc (special conversions)

FAQ

Q: Files not reading correctly?
A: Ensure LOCAL_FILE_ALLOWED_DIRECTORIES includes your file's directory.

Q: OCR not working for images?
A: Configure LLM_VISION_API_KEY with a valid vision model API key (OpenAI GPT-4o or compatible).

Q: Want to improve processing speed?
A: The smart cache automatically remembers processed files. Clear cache directory if you want fresh processing of all files.

Q: Legacy Office files (.doc/.ppt) failing?
A: Install LibreOffice: brew install --cask libreoffice (macOS) or equivalent for your OS.

Q: What file formats are supported?
A: PDF, Word, Excel, PowerPoint, OpenDocument, images (with OCR), archives, text files, and more.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to contribute to this project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Links

Issues: Report Issues
Documentation: CLAUDE.md for detailed development guide
Model Context Protocol: Official MCP Documentation

Acknowledgments

Built with FastMCP
PDF parsing powered by PyMuPDF4LLM
Caching system using DiskCache

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
.playwright-mcp		.playwright-mcp
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.de.md		README.de.md
README.fr.md		README.fr.md
README.ja.md		README.ja.md
README.md		README.md
README.zh.md		README.zh.md
configure_claude.sh		configure_claude.sh
env.example		env.example
install.sh		install.sh
mcp_server.py		mcp_server.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
start_mcp.sh		start_mcp.sh
uv.lock		uv.lock

License

freefish1218/mcp-local-reader

Folders and files

Latest commit

History

Repository files navigation

MCP-LOCAL-Reader

Features

📄 AI-Optimized File Processing

🚀 Intelligent Performance

🔒 Security & Control

Quick Start

Prerequisites

Installation

Option 1: One-Command Setup (Recommended)

Option 2: Manual Installation

Configuration for Claude Desktop

Automatic Configuration

Manual Configuration

Configuration for Claude Code

Usage

📄 Read & Convert to AI-Ready Markdown

🔄 Save as Markdown Files

Configuration

Essential Settings (.env)

Optional OCR Settings

Environment Variables

MCP Tools

read_local_file

convert_local_file

Supported File Types

Document Formats

Image Formats (with OCR)

Archive Formats

Special Formats

Architecture

Core Components

Performance Optimizations

Development

Setup Development Environment

Running Tests

Adding New Parsers

Performance Characteristics

System Requirements

FAQ

Contributing

License

Links

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

`read_local_file`

`convert_local_file`

Packages