Intelligent CSV validation and correction for e-commerce.
ValidaHub processes large product catalogs asynchronously, applies marketplace-specific rules, and auto-corrects errors with full observability.
ValidaHub é uma plataforma escalável para validação e correção inteligente de planilhas CSV de e-commerce. Já integra Mercado Livre, Shopee e Amazon, com processamento assíncrono e telemetria completa.
- Multi-Marketplace Validation: Mercado Livre, Shopee, Amazon
- Intelligent Auto-Correction with preview and selective application
- Asynchronous Processing via Celery + Redis for large files
- Complete Telemetry with structured events and metrics
- Repository Pattern for data layer abstraction
- Centralized Logging with correlation IDs
- Rate Limiting via Redis
- Flexible Authentication (JWT & API Keys)
ValidaHub isn't just another CSV validator. It's specifically designed for the unique challenges of Brazilian and Latin American e-commerce:
- Marketplace-Specific Rules: Deep understanding of Mercado Livre, Shopee, and Amazon's specific requirements
- Intelligent Corrections: Goes beyond validation - actually fixes common errors automatically
- Enterprise-Grade Telemetry: Full observability for debugging and performance monitoring
- Scalable Architecture: Handles files with millions of products without breaking a sweat
- Brazilian Market Focus: Built with Brazilian marketplace nuances in mind (tax codes, shipping rules, category mappings)
- FastAPI (0.104+) - High-performance async web framework
- Celery (5.3+) - Distributed task queue
- Redis (7+) - Cache and message broker
- PostgreSQL (15+) - Primary database
- SQLAlchemy (2.0+) - Modern ORM with async support
- Pydantic (2.0+) - Data validation and settings
- Pandas (2.0+) - Efficient CSV manipulation
- Next.js 14 - React framework with App Router
- TypeScript (5.0+) - Type safety
- Tailwind CSS (3.3+) - Utility-first styling
- shadcn/ui - Accessible component library
- TanStack Query (5.0+) - Powerful data synchronization
- Docker & Docker Compose - Containerization
- GitHub Actions - CI/CD pipeline
- pytest (7.4+) - Backend testing with fixtures
- Vitest (1.0+) - Blazing fast frontend testing
- codecov - Code coverage tracking
# Required versions
Node.js 20+
Python 3.11+
Docker & Docker Compose
Redis 7+
PostgreSQL 15+- Clone the repository
git clone https://github.com/drapala/validahub-new.git
cd validahub-new- Set up environment
# Copy and adjust environment variables
cp apps/api/.env.example apps/api/.env
cp apps/web/.env.example apps/web/.env- Start infrastructure services
# Start PostgreSQL, Redis, and pgAdmin
docker-compose up -d
# Verify they're running
docker-compose ps- Set up Backend
cd apps/api
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Apply migrations
alembic upgrade head
# Start the server
uvicorn src.main:app --reload --port 8000- Set up Frontend
# In another terminal
cd apps/web
npm install
npm run dev- Start Celery Worker (for async processing)
# In another terminal
cd apps/api
celery -A src.workers.celery_app worker --loglevel=info- 🌐 Frontend: http://localhost:3001
- 🔧 Backend API: http://localhost:8000
- 📚 API Docs: http://localhost:8000/docs
- 🗄️ pgAdmin: http://localhost:5050
graph TB
subgraph "Frontend Layer"
FE[Next.js Application]
FE --> |REST API| API
end
subgraph "API Layer"
API[FastAPI Gateway]
API --> UC[Use Cases]
end
subgraph "Business Layer"
UC --> SVC[Services]
SVC --> |Validation| RE[Rule Engine]
SVC --> |Storage| ST[Storage Service]
SVC --> |Jobs| JS[Job Service]
SVC --> |Events| TEL[Telemetry]
end
subgraph "Infrastructure Layer"
RE --> REPO[Repositories]
JS --> CEL[Celery Queue]
ST --> S3[S3 Storage]
REPO --> PG[(PostgreSQL)]
CEL --> REDIS[(Redis)]
TEL --> KAFKA[Event Stream]
end
style FE fill:#61dafb
style API fill:#009485
style PG fill:#336791
style REDIS fill:#dc382d
┌─────────────────────────────────────────────────────────────────┐
│ Frontend (Next.js) │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐ │
│ │ Upload │ │ Jobs │ │ Results │ │ Settings │ │
│ └────────────┘ └────────────┘ └────────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
REST API
│
┌─────────────────────────────────────────────────────────────────┐
│ Backend (FastAPI) │
├─────────────────────────────────────────────────────────────────┤
│ API Layer │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ /validate_csv /correct_csv /jobs /validate_row │ │
│ └──────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Use Cases Layer │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ ValidateCsvUseCase CorrectCsvUseCase ValidateRowUseCase│ │
│ └──────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Services Layer │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ JobService RuleEngineService StorageService Telemetry │ │
│ └──────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Infrastructure Layer │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Repositories Queue(Celery) Cache(Redis) Storage(S3) │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
POST /api/v1/validate_csv
Content-Type: multipart/form-data
file: file.csv
marketplace: MERCADO_LIVRE | SHOPEE | AMAZON
category: ELETRONICOS | MODA | CASAPOST /api/v1/correct_csv
Content-Type: multipart/form-data
file: file.csv
marketplace: string
category: string
auto_fix: boolean# Create job
POST /api/v1/jobs
{
"type": "validate_csv",
"params": {...}
}
# Check status
GET /api/v1/jobs/{job_id}
# List jobs
GET /api/v1/jobs?status=pending&limit=10POST /api/v1/validate_row
{
"row_data": {...},
"marketplace": "MERCADO_LIVRE",
"row_number": 1
}# Backend tests
cd apps/api
pytest # Run all tests
pytest tests/unit # Unit tests only
pytest tests/integration # Integration tests only
pytest --cov # With coverage report
# Frontend tests
cd apps/web
npm test # Run all tests
npm run test:watch # Watch mode
npm run test:coverage # With coverage reportThe system emits structured events for complete observability:
- Validation Events:
validation.started,validation.completed,validation.failed - Job Events:
job.created,job.started,job.completed,job.failed - Performance Metrics: Latency, throughput, error rate
- System Events: Health checks, rate limiting, authentication
{
"event": "validation.completed",
"timestamp": "2024-01-15T10:30:00Z",
"correlation_id": "abc-123",
"data": {
"marketplace": "MERCADO_LIVRE",
"rows_processed": 10000,
"errors_found": 42,
"processing_time_ms": 1250
}
}# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/validahub
# Redis
REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/1
# Security
JWT_SECRET_KEY=your-secret-key
API_KEY_SALT=your-api-salt
# Storage
S3_BUCKET_NAME=validahub-files
AWS_ACCESS_KEY_ID=your-key
AWS_SECRET_ACCESS_KEY=your-secret
# Telemetry
TELEMETRY_ENABLED=true
TELEMETRY_KAFKA_ENABLED=false
TELEMETRY_WEBHOOK_URL=https://your-webhook.com
# Rate Limiting
RATE_LIMIT_ENABLED=true
RATE_LIMIT_PER_MINUTE=100- Split JobService into specialized components (SOLID)
- Decouple from Celery (queue abstraction layer)
- Complete dependency injection implementation
- StorageAdapter for multiple backends (S3, GCS, Azure)
- WebSocket support for real-time updates
- Metrics and analytics dashboard
- GraphQL API alongside REST
- Webhook notifications system
- Machine Learning for predictive corrections
- Support for more marketplaces (Magalu, Americanas, B2W)
- Multi-tenant architecture
- API SDK for Python and Node.js
We love contributions! Please see our Contributing Guide for details.
- Fork the project
- Create a feature branch
git checkout -b feat/amazing-feature
- Make your changes
- Follow our code style (PEP 8 for Python, ESLint for TypeScript)
- Write tests for new features
- Update documentation as needed
- Run tests and linting
# Backend pytest && ruff check # Frontend npm test && npm run lint
- Commit your changes
git commit -m 'feat: add amazing feature' - Push to your fork
git push origin feat/amazing-feature
- Open a Pull Request
Check out our "good first issue" label for beginner-friendly tasks!
- Commits: Follow Conventional Commits
- Branch Naming:
feat/,fix/,docs/,refactor/,test/ - Python: PEP 8 + type hints
- TypeScript: ESLint + Prettier
- Tests: Required for all new features
- Architecture Decision Records
- Job System Documentation
- Adapter Pattern Guide
- API Reference
- Technical Debt Tracker
Built with ❤️ by the ValidaHub team.
Special thanks to all contributors.
Proprietary - All rights reserved © 2024 ValidaHub
Made in Brazil 🇧🇷