AI News Aggregator

A modular TypeScript-based news aggregator that collects, enriches, and analyzes AI-related content from multiple sources.

Features

Modular Plugin System
Easily extendable with plugins for data sources, AI processing, content enrichment, summary generation, and storage.
Diverse Data Sources
Pre-built plugins for:
- Discord (raw messages, user details, AI-summarized conversations, media download)
- GitHub (repository statistics, contributor activity)
- Cryptocurrency Analytics (Solana via DexScreener, general tokens via Codex API, market data via CoinGecko)
- Generic APIs (configurable for various REST endpoints)
AI-Powered Processing
- Automated content summarization (e.g., daily reports, Discord channel activity) using configurable AI providers (OpenAI, OpenRouter).
- Token limit resilience with automatic fallback models for large content processing.
- Optional content enrichment (e.g., topic extraction, image generation, meme generation via Imgflip API).
Flexible Storage & Output
- SQLite for persistent storage of fetched content and generated summaries.
- Customizable data export (e.g., raw daily Discord data as JSON).
- Generation of summaries in JSON and Markdown formats.
Historical Data Processing
Dedicated script (historical.ts) for fetching and processing data from past dates or ranges.
Configuration Driven
Behavior controlled by JSON configuration files and environment variables.

Prerequisites

Node.js ≥ 18 (v23 recommended)
TypeScript
SQLite3 (command-line tool required for integrity checks)
npm

Installation

git clone https://github.com/yourusername/ai-news.git
cd ai-news
npm install
cp example.env .env

Configuration

Use JSON files in the config/ directory and a .env file for secrets.

Example `.env` File

# OpenAI/OpenRouter Configuration
OPENAI_API_KEY=your_openai_or_openrouter_api_key
OPENAI_DIRECT_KEY=your_direct_openai_key_for_images
USE_OPENROUTER=true
SITE_URL=https://your-domain.com/ai-news/
SITE_NAME=AI_News

# Discord Bot Configuration
DISCORD_APP_ID=your_discord_app_id
DISCORD_TOKEN=your_discord_bot_token
DISCORD_GUILD_ID=your_main_discord_guild_id

# Optional: Additional Discord servers
HYPERFY_DISCORD_APP_ID=your_hyperfy_discord_app_id
HYPERFY_DISCORD_TOKEN=your_hyperfy_discord_token
HYPERFY_DISCORD_GUILD_ID=your_hyperfy_guild_id

# API Keys for Data Sources
CODEX_API_KEY=your_codex_api_key
BIRDEYE_API_KEY=your_birdeye_api_key

# CDN Configuration (optional)
BUNNY_STORAGE_ZONE=your_bunny_storage_zone
BUNNY_STORAGE_PASSWORD=your_bunny_api_password
BUNNY_CDN_URL=https://your-custom-cdn.com

See .env.example for all available options.

GitHub Actions Secrets

Create these repository secrets in GitHub:

ENV_SECRETS — JSON object with credentials:

{
  "OPENAI_API_KEY": "sk-...",
  "OPENAI_DIRECT_KEY": "sk-...",
  "USE_OPENROUTER": "true",
  "SITE_URL": "https://your-domain.com/ai-news/",
  "SITE_NAME": "AI_News",
  "DISCORD_APP_ID": "your_discord_app_id",
  "DISCORD_TOKEN": "your_discord_bot_token",
  "DISCORD_GUILD_ID": "your_discord_guild_id",
  "HYPERFY_DISCORD_APP_ID": "your_hyperfy_app_id",
  "HYPERFY_DISCORD_TOKEN": "your_hyperfy_token",
  "HYPERFY_DISCORD_GUILD_ID": "your_hyperfy_guild_id",
  "CODEX_API_KEY": "your_codex_key",
  "BIRDEYE_API_KEY": "your_birdeye_key",
  "IMGFLIP_USERNAME": "your_imgflip_username",
  "IMGFLIP_PASSWORD": "your_imgflip_password",
  "BUNNY_STORAGE_ZONE": "your_bunny_storage_zone",
  "BUNNY_STORAGE_PASSWORD": "your_bunny_api_password",
  "BUNNY_CDN_URL": "https://your-custom-cdn.com"
}

Notes: OPENAI_DIRECT_KEY is required for image generation when using OpenRouter. IMGFLIP_* credentials are required for meme generation (sign up at imgflip.com). HYPERFY_*, BIRDEYE_*, and BUNNY_* keys are optional depending on your configuration.

SQLITE_ENCRYPTION_KEY — strong password to encrypt the database.

For Webhook Server Integration (deploy-media-collection.yml)

COLLECT_WEBHOOK_URL — Your webhook server endpoint:

https://your-server.com/run-collect

COLLECT_WEBHOOK_SECRET — HMAC signing secret (generate with openssl rand -hex 32):

a1b2c3d4e5f6...

For CDN Media Upload (media-cdn-upload.yml)

BUNNY_STORAGE_ZONE — Storage zone name from Bunny.net dashboard (Storage → your zone name)
BUNNY_STORAGE_PASSWORD — FTP & API Access password from Bunny.net (Storage → FTP & API Access → Password)

Optional: Set BUNNY_CDN_URL in ENV_SECRETS if using a custom hostname (default: https://{zone}.b-cdn.net)

Running the Application

npm run build
npm start
npm start -- --source=elizaos.json

Historical Data Fetching

npm run historical -- --source=elizaos.json --output=./output
npm run historical -- --source=hyperfy-discord.json --after=2024-01-10 --before=2024-01-16 --output=./output/hyperfy
npm run historical -- --source=discord-raw.json --after=2024-01-15 --output=./output/discord
npm run historical -- --source=discord-raw.json --before=2024-01-10 --output=./output/discord

Channel Management

Unified CLI for Discord channel discovery, tracking, and configuration:

# Discover channels (Discord API, or falls back to raw data if no token)
npm run channels -- discover

# Analyze channels with LLM (TRACK/MAYBE/SKIP recommendations)
npm run channels -- analyze

# Generate PR markdown with config changes
npm run channels -- propose

# Query and manage channels
npm run channels -- list [--tracked|--active|--muted]
npm run channels -- show <channelId>
npm run channels -- stats
npm run channels -- track|untrack|mute|unmute <channelId>

# With media download
npm run historical -- --source=elizaos.json --download-media --date=2024-01-15

Media Download

npm run download-media                                    # Today's media
npm run download-media -- --date=2024-01-15             # Specific date
npm run download-media -- --start=2024-01-10 --end=2024-01-15  # Date range

CDN Upload

Upload media files to Bunny CDN for permanent hosting (Discord URLs expire after 24h):

# Upload single file
npm run upload-cdn -- --file ./media/image.png --remote elizaos-media/

# Upload directory
npm run upload-cdn -- --dir ./media/ --remote elizaos-media/

# Upload and update manifest with CDN URLs
npm run upload-cdn -- --manifest ./media/manifest.json --update-manifest

# Swap Discord URLs for CDN URLs in summary JSON
npm run upload-cdn -- --swap-urls ./output/elizaos/json/2024-01-15.json \
  --manifest ./media/manifest.json \
  --output ./output/elizaos/json-cdn/2024-01-15.json

# Preview without uploading
npm run upload-cdn -- --dir ./media/ --remote elizaos-media/ --dry-run

Automated: The media-cdn-upload.yml workflow runs daily at 7:30 AM UTC to upload media and create CDN-enriched JSON files.

Workflow

npm run channels -- discover   # Fetch channels
npm run channels -- analyze    # Run LLM analysis
npm run channels -- propose    # Generate PR markdown

GitHub Actions: Monthly workflow analyzes channels and creates draft PRs with recommendations.

Server Deployment

For running data collection on a server instead of GitHub Actions (recommended for media downloads due to file size limits):

Webhook Server Setup

Clone repository to server: git clone <repo> ~/ai-news
Install dependencies: cd ~/ai-news && npm install && npm run build
Copy .env.example to .env and configure with your API keys
Generate webhook secret: openssl rand -hex 32

Start webhook server:

export COLLECT_WEBHOOK_SECRET="your-generated-secret"
npm run webhook

Setup reverse proxy (Nginx/Caddy) with HTTPS for production

Usage

Webhook Server:

# Start server (listens on localhost:3000)
export COLLECT_WEBHOOK_SECRET="your-secret"
npm run webhook

# Test webhook locally
./scripts/test-webhook.sh elizaos.json 2025-01-15

Manual Collection (Alternative):

# Direct script execution
./scripts/collect-daily.sh elizaos.json
./scripts/collect-daily.sh hyperfy-discord.json 2025-01-15

GitHub Actions Integration:

Configure COLLECT_WEBHOOK_URL and COLLECT_WEBHOOK_SECRET in GitHub Secrets
GitHub Actions sends HMAC-signed webhook requests daily at 6 AM UTC
View/trigger manual runs at Actions > Daily Media Collection
No SSH keys or server access needed

Benefits of Webhook Approach:

No SSH complexity or key management
Secure HMAC signature verification
No GitHub file size limits for media downloads
GitHub Actions provides scheduling and monitoring
Simple HTTP-based integration

Media Download

Discord media files (images, videos, attachments) can be downloaded to a VPS using a manifest-based approach.

How It Works

GitHub Actions generates a media-manifest.json with URLs during daily runs
Manifest is deployed to gh-pages branch
VPS script fetches manifest and downloads files

Generate Manifest Locally

No API calls - reads directly from existing database:

# Single date
npm run generate-manifest -- --db data/elizaos.sqlite --date 2024-12-14 --source elizaos

# Date range (for backfill)
npm run generate-manifest -- --db data/elizaos.sqlite --start 2024-12-01 --end 2024-12-14 --source elizaos

# Custom output path
npm run generate-manifest -- --db data/elizaos.sqlite --date 2024-12-14 --source elizaos --manifest-output ./my-manifest.json

# View manifest contents
cat ./output/elizaos/media-manifest.json | jq '.stats'

Manifest Contents

Each manifest entry includes full Discord metadata for querying:

# Filter by user
cat manifest.json | jq '.files[] | select(.user_id == "123456789")'

# Only direct attachments (no embeds)
cat manifest.json | jq '.files[] | select(.media_type == "attachment")'

# Files with reactions
cat manifest.json | jq '.files[] | select(.reactions != null)'

# Count per user
cat manifest.json | jq '[.files[].user_id] | group_by(.) | map({user: .[0], count: length}) | sort_by(-.count)'

Fields: url, filename, user_id, guild_id, channel_id, message_id, message_content, reactions, media_type, content_type, width, height, size, proxy_url

VPS Setup

# Clone and setup
git clone https://github.com/M3-org/ai-news.git ~/ai-news-media
python3 ~/ai-news-media/scripts/media-sync.py setup

# Download media (from gh-pages manifests)
python3 ~/ai-news-media/scripts/media-sync.py sync --dry-run  # Preview
python3 ~/ai-news-media/scripts/media-sync.py sync            # Download
python3 ~/ai-news-media/scripts/media-sync.py sync --min-free 1000  # Stop if <1GB free

# Check status (disk usage and media sizes)
python3 ~/ai-news-media/scripts/media-sync.py status

The setup command installs a systemd timer that runs daily at 01:30 UTC.

Download with Fresh URLs

Discord CDN URLs expire after ~24 hours. Use refresh to fetch fresh URLs and download:

export DISCORD_TOKEN  # Bot token required

# Download all files for a specific user
python3 scripts/media-sync.py refresh manifest.json --user USER_ID -o ./user_media

# Only attachments (no embeds/thumbnails)
python3 scripts/media-sync.py refresh manifest.json --user USER_ID --type attachment

# Preview without downloading
python3 scripts/media-sync.py refresh manifest.json --user USER_ID --dry-run

Manifest Location

After GitHub Actions runs, manifests are available at:

https://raw.githubusercontent.com/M3-org/ai-news/gh-pages/elizaos/media-manifest.json
https://raw.githubusercontent.com/M3-org/ai-news/gh-pages/hyperfy/media-manifest.json

Project Structure

.github/              GitHub Actions workflows
config/               Configuration files
data/                 Encrypted SQLite databases
docs/                 Docusaurus documentation
src/                  Source code
  aggregator/         Aggregators (ContentAggregator, HistoricalAggregator)
  plugins/            Plugins (sources, enrichers, generators, storage)
  helpers/            Utility functions
  types.ts            Type definitions
  index.ts            Main entry point
  historical.ts       Historical data runner
example.env           Template .env
README.md             This file

Adding New Sources

Create a new class in src/plugins/sources/ implementing ContentSource:

import { ContentItem } from "../../types";

export interface ContentSource {
  name: string;
  fetchItems(): Promise<ContentItem[]>;
  fetchHistorical?(date: string): Promise<ContentItem[]>;
}

Add the new source to the desired config file:

{
  "type": "YourNewSource",
  "name": "descriptive-name",
  "interval": 300,
  "params": {}
}

Contributing

git checkout -b feature/YourFeature
git commit -m "Add YourFeature"
git push origin feature/YourFeature

License

MIT

Core Data Structures

`ContentItem`

{
  cid: string;
  type: string;
  source: string;
  text?: string;
  date?: number;
  metadata?: { [key: string]: any };
}

`SummaryItem`

{
  type: string;
  title?: string;
  categories?: string;
  markdown?: string;
  date?: number;
}

Supported Source Types

Discord: DiscordRawDataSource, DiscordChannelSource, DiscordAnnouncementSource
GitHub: GitHubStatsDataSource, GitHubDataSource
Crypto: CodexAnalyticsSource, CoinGeckoAnalyticsSource, SolanaAnalyticsSource
Generic: ApiSource

Scheduled Tasks

GitHub Actions workflows in .github/workflows/ automate scheduled processing for data sources and summary generation.

Name		Name	Last commit message	Last commit date
Latest commit History 429 Commits
.cursor/rules		.cursor/rules
.github		.github
config		config
data		data
frontend		frontend
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README-images.md		README-images.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

M3-org/ai-news

Folders and files

Latest commit

History

Repository files navigation

AI News Aggregator

Features

Prerequisites

Installation

Configuration

Example .env File

GitHub Actions Secrets

For Webhook Server Integration (deploy-media-collection.yml)

For CDN Media Upload (media-cdn-upload.yml)

Running the Application

Historical Data Fetching

Channel Management

Media Download

CDN Upload

Workflow

Server Deployment

Webhook Server Setup

Usage

Media Download

How It Works

Generate Manifest Locally

Manifest Contents

VPS Setup

Download with Fresh URLs

Manifest Location

Project Structure

Adding New Sources

Contributing

License

Core Data Structures

ContentItem

SummaryItem

Supported Source Types

Scheduled Tasks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Example `.env` File

`ContentItem`

`SummaryItem`

Packages