Skip to content

allthingsllm/gptjs

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

MicroLLM

A minimal GPT-style language model for character-level next-token prediction. Trains on a corpus of names (or any text) and learns to generate similar sequences. Pure JavaScript, no dependencies. Ported from Karpathy's micrograd-based makemore.

What It Does

  1. Loads input.txt (one item per line, e.g. names)
  2. Trains a small transformer on next-character prediction
  3. Evaluates on held-out data (loss, perplexity, accuracy)
  4. Generates 20 sample sequences autoregressively

Requirements

  • Node.js
  • input.txt in the project directory (lines of text to train on)

Usage

node microllm.js

Code Structure

Utilities (lines 8–39)

  • createRandom(seed) – LCG-based seeded PRNG (reproducible runs).
  • gauss(mean, std) – Box-Muller transform for Gaussian samples (weight init).
  • shuffle(arr) – Fisher-Yates shuffle.
  • weightedChoice(weights) – Samples an index from a discrete distribution (inference sampling).

Data (lines 41–48)

Loads input.txt, splits into 90% train / 10% validation, shuffles. Docs are lines of text (e.g. names).

Tokenizer (lines 50–56)

Character-level tokenizer. chars = ['<BOS>', 'a', 'b', ...] (sorted unique chars + BOS). stoi / itos map between chars and integer ids. Sequences are wrapped with BOS at start and end.

Value (Autograd) (lines 58–145)

Scalar autograd engine. Each Value holds data and grad and tracks its inputs (_prev) and backward rule (_backward). Operations (add, mul, pow, log, exp, relu) build a DAG; backward() does reverse-mode differentiation via topological sort. Used for all model parameters and activations so gradients flow through the full graph.

Parameters (lines 147–182)

  • nEmbd (16) – Embedding dimension.
  • nHead (4) – Attention heads.
  • nLayer (1) – Transformer blocks.
  • blockSize (8) – Max context length.
  • matrix(nout, nin, std) – Initializes nout × nin matrix of Value objects with Gaussian(0, std).

State dict:

  • wte – Token embeddings (vocab × nEmbd).
  • wpe – Position embeddings (blockSize × nEmbd).
  • layer{i}.attn_wq/wk/wv/wo – Attention Q, K, V, O projections.
  • layer{i}.mlp_fc1/fc2 – MLP (4× expand, ReLU² activation).
  • lm_head – Output projection to vocab logits.

Heads and MLP output layers use std=0 to start from identity-like behavior.

Model (lines 184–242)

  • linear(x, w)x vector × w matrix; returns w @ x.
  • softmax(logits) – Numerically stable softmax (subtract max before exp).
  • rmsnorm(x) – RMS normalization: scale by (mean(x²) + ε)^(-0.5).
  • gpt(tokenId, posId, keys, values) – Forward pass:
    1. Embed: tok_emb + pos_emb, then RMSNorm.
    2. Per layer:
      • Attention: RMSNorm → Q,K,V projections → per-head scaled dot-product attention over cached K,V → O projection → residual.
      • MLP: RMSNorm → fc1 → ReLU² → fc2 → residual.
    3. Final projection to vocab logits.

Uses causal attention: at position t, only positions 0..t can attend (via keys/values built step-wise).

Adam (lines 244–250)

First and second moment buffers (m, v). Bias correction with β1^(step+1) and β2^(step+1). Learning rate decays linearly to 0 over training.

Training Loop (lines 252–289)

Each step: pick a doc, tokenize as [BOS, ...chars, BOS], take up to blockSize positions. For each position, run GPT forward, get logits → softmax → cross-entropy loss on target. Average loss over positions, call backward(), then Adam update. Zero grads after each step.

Evaluation (lines 291–314)

On up to 500 validation docs: forward only (no backward), compute cross-entropy, argmax accuracy, and perplexity exp(loss).

Inference (lines 316–334)

For each sample: start with BOS, autoregressively run GPT, softmax with temperature scaling, sample next token via weightedChoice, decode and print until BOS or blockSize. Temperature 0.6 softens the distribution for more varied outputs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%