Claude Code with NVIDIA Inference API

Run Claude Code using NVIDIA's hosted inference API instead of direct Anthropic API access. This is useful for teams with NVIDIA API access who want to use Claude Code without individual Anthropic API keys.

How It Works

┌─────────────┐     ┌─────────────┐     ┌──────────────────────┐
│ Claude Code │────▶│   LiteLLM   │────▶│ NVIDIA Inference API │
│ (Anthropic  │     │   Proxy     │     │ (Sonnet / Opus)      │
│  format)    │     │ (localhost) │     │                      │
└─────────────┘     └─────────────┘     └──────────────────────┘

Claude Code expects Anthropic's API format (/v1/messages)
NVIDIA's API uses OpenAI-compatible format (/chat/completions)
LiteLLM translates between the two formats

Quick Start

Prerequisites

NVIDIA API Key from inference.nvidia.com/key-management
Python 3.10+
Basic tools: curl, python3 (pre-installed on most systems)

Setup

git clone https://github.com/dburkhardt/claude-code-nvidia-inference.git
cd claude-code-nvidia-inference
source scripts/setup_env.sh

The setup script will:

Install uv (fast Python package manager)
Install Claude Code CLI
Prompt for your NVIDIA API key (saved for future sessions)
Install and start the LiteLLM proxy
Configure Claude Code to use the proxy

Run Claude Code

claude

That's it! Claude Code will route through the NVIDIA inference API.

Available Models

NVIDIA Model	Claude Code Usage
Sonnet 4.5 (default)	`claude`
Opus 4.5	`claude --model claude-opus-4-5-20250929`

Subsequent Sessions

After initial setup, just run:

cd claude-code-nvidia-inference
source scripts/setup_env.sh   # Loads saved API key, starts proxy if needed
claude

The script is idempotent - it detects if the proxy is already running.

Configuration Reference

LiteLLM Proxy (`litellm_config.yaml`)

Maps Claude Code model requests to NVIDIA's hosted Claude models:

claude-sonnet-4-5-* → aws/anthropic/bedrock-claude-sonnet-4-5-v1
claude-opus-4-5-* → aws/anthropic/claude-opus-4-5
Haiku requests → Sonnet (Haiku not available on NVIDIA)

Claude Code Config (`~/.claude/settings.json`)

{
  "env": {
    "ANTHROPIC_API_KEY": "sk-litellm-local-dev",
    "ANTHROPIC_BASE_URL": "http://localhost:4000"
  }
}

Known Limitations

Context Window

NVIDIA's Bedrock-hosted Claude models have a smaller context window than the direct Anthropic API (~100K vs 200K tokens). The litellm_config.yaml is configured with max_input_tokens: 100000 to enable pre-call validation, allowing Claude Code to trigger context compaction before hitting the API limit.

This limit was determined empirically - the actual NVIDIA limit is approximately 111K tokens, so 100K provides a safety margin.

Other Limitations

Programmatic mode recommended - Use -p flag for non-interactive usage
Some features may vary - Tool use, streaming, and advanced features route through LiteLLM
Additional latency - Extra hop through LiteLLM proxy adds ~50-100ms

Troubleshooting

"Auth conflict" Warning

Auth conflict: Both a token (ANTHROPIC_AUTH_TOKEN) and an API key (ANTHROPIC_API_KEY) are set.

Fix: Run claude /logout and unset ANTHROPIC_AUTH_TOKEN

401 Unauthorized Errors

Possible causes:

NVIDIA_API_KEY isn't set correctly - check with echo $NVIDIA_API_KEY
LiteLLM proxy isn't running - check with curl http://localhost:4000/health

Claude Code Hangs

Ensure:

LiteLLM proxy is running (curl http://localhost:4000/health)
You've logged out of any existing Claude authentication (claude /logout)

Proxy Won't Start

# Check what's using port 4000
lsof -i :4000

# Kill it if needed
lsof -ti :4000 | xargs kill -9

# Restart
source scripts/setup_env.sh --restart

Testing

Run the test script to verify everything works:

./scripts/test_nvidia_endpoint.sh

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
scripts		scripts
.gitignore		.gitignore
README.md		README.md
litellm_config.yaml		litellm_config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Code with NVIDIA Inference API

How It Works

Quick Start

Prerequisites

Setup

Run Claude Code

Available Models

Subsequent Sessions

Configuration Reference

LiteLLM Proxy (`litellm_config.yaml`)

Claude Code Config (`~/.claude/settings.json`)

Known Limitations

Context Window

Other Limitations

Troubleshooting

"Auth conflict" Warning

401 Unauthorized Errors

Claude Code Hangs

Proxy Won't Start

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude Code with NVIDIA Inference API

How It Works

Quick Start

Prerequisites

Setup

Run Claude Code

Available Models

Subsequent Sessions

Configuration Reference

LiteLLM Proxy (litellm_config.yaml)

Claude Code Config (~/.claude/settings.json)

Known Limitations

Context Window

Other Limitations

Troubleshooting

"Auth conflict" Warning

401 Unauthorized Errors

Claude Code Hangs

Proxy Won't Start

Testing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

LiteLLM Proxy (`litellm_config.yaml`)

Claude Code Config (`~/.claude/settings.json`)

Packages