Skip to content

pstjohn/claude-code-nvidia-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Claude Code with NVIDIA Inference API

Run Claude Code using NVIDIA's hosted inference API instead of direct Anthropic API access. This is useful for teams with NVIDIA API access who want to use Claude Code without individual Anthropic API keys.

How It Works

┌─────────────┐     ┌─────────────┐     ┌──────────────────────┐
│ Claude Code │────▶│   LiteLLM   │────▶│ NVIDIA Inference API │
│ (Anthropic  │     │   Proxy     │     │ (Sonnet / Opus)      │
│  format)    │     │ (localhost) │     │                      │
└─────────────┘     └─────────────┘     └──────────────────────┘
  • Claude Code expects Anthropic's API format (/v1/messages)
  • NVIDIA's API uses OpenAI-compatible format (/chat/completions)
  • LiteLLM translates between the two formats

Quick Start

Prerequisites

Setup

git clone https://github.com/dburkhardt/claude-code-nvidia-inference.git
cd claude-code-nvidia-inference
source scripts/setup_env.sh

The setup script will:

  1. Install uv (fast Python package manager)
  2. Install Claude Code CLI
  3. Prompt for your NVIDIA API key (saved for future sessions)
  4. Install and start the LiteLLM proxy
  5. Configure Claude Code to use the proxy

Run Claude Code

claude

That's it! Claude Code will route through the NVIDIA inference API.

Available Models

NVIDIA Model Claude Code Usage
Sonnet 4.5 (default) claude
Opus 4.5 claude --model claude-opus-4-5-20250929

Subsequent Sessions

After initial setup, just run:

cd claude-code-nvidia-inference
source scripts/setup_env.sh   # Loads saved API key, starts proxy if needed
claude

The script is idempotent - it detects if the proxy is already running.

Configuration Reference

LiteLLM Proxy (litellm_config.yaml)

Maps Claude Code model requests to NVIDIA's hosted Claude models:

  • claude-sonnet-4-5-*aws/anthropic/bedrock-claude-sonnet-4-5-v1
  • claude-opus-4-5-*aws/anthropic/claude-opus-4-5
  • Haiku requests → Sonnet (Haiku not available on NVIDIA)

Claude Code Config (~/.claude/settings.json)

{
  "env": {
    "ANTHROPIC_API_KEY": "sk-litellm-local-dev",
    "ANTHROPIC_BASE_URL": "http://localhost:4000"
  }
}

Known Limitations

Context Window

NVIDIA's Bedrock-hosted Claude models have a smaller context window than the direct Anthropic API (~100K vs 200K tokens). The litellm_config.yaml is configured with max_input_tokens: 100000 to enable pre-call validation, allowing Claude Code to trigger context compaction before hitting the API limit.

This limit was determined empirically - the actual NVIDIA limit is approximately 111K tokens, so 100K provides a safety margin.

Other Limitations

  • Programmatic mode recommended - Use -p flag for non-interactive usage
  • Some features may vary - Tool use, streaming, and advanced features route through LiteLLM
  • Additional latency - Extra hop through LiteLLM proxy adds ~50-100ms

Troubleshooting

"Auth conflict" Warning

Auth conflict: Both a token (ANTHROPIC_AUTH_TOKEN) and an API key (ANTHROPIC_API_KEY) are set.

Fix: Run claude /logout and unset ANTHROPIC_AUTH_TOKEN

401 Unauthorized Errors

Possible causes:

  1. NVIDIA_API_KEY isn't set correctly - check with echo $NVIDIA_API_KEY
  2. LiteLLM proxy isn't running - check with curl http://localhost:4000/health

Claude Code Hangs

Ensure:

  1. LiteLLM proxy is running (curl http://localhost:4000/health)
  2. You've logged out of any existing Claude authentication (claude /logout)

Proxy Won't Start

# Check what's using port 4000
lsof -i :4000

# Kill it if needed
lsof -ti :4000 | xargs kill -9

# Restart
source scripts/setup_env.sh --restart

Testing

Run the test script to verify everything works:

./scripts/test_nvidia_endpoint.sh

License

MIT License

About

Run Claude Code using NVIDIA's hosted inference API via LiteLLM proxy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages