MicroLLM

A minimal GPT-style language model for character-level next-token prediction. Trains on a corpus of names (or any text) and learns to generate similar sequences. Pure JavaScript, no dependencies. Ported from Karpathy's micrograd-based makemore.

What It Does

Loads input.txt (one item per line, e.g. names)
Trains a small transformer on next-character prediction
Evaluates on held-out data (loss, perplexity, accuracy)
Generates 20 sample sequences autoregressively

Requirements

Node.js
input.txt in the project directory (lines of text to train on)

Usage

node microllm.js

Code Structure

Utilities (lines 8–39)

createRandom(seed) – LCG-based seeded PRNG (reproducible runs).
gauss(mean, std) – Box-Muller transform for Gaussian samples (weight init).
shuffle(arr) – Fisher-Yates shuffle.
weightedChoice(weights) – Samples an index from a discrete distribution (inference sampling).

Data (lines 41–48)

Loads input.txt, splits into 90% train / 10% validation, shuffles. Docs are lines of text (e.g. names).

Tokenizer (lines 50–56)

Character-level tokenizer. chars = ['<BOS>', 'a', 'b', ...] (sorted unique chars + BOS). stoi / itos map between chars and integer ids. Sequences are wrapped with BOS at start and end.

Value (Autograd) (lines 58–145)

Scalar autograd engine. Each Value holds data and grad and tracks its inputs (_prev) and backward rule (_backward). Operations (add, mul, pow, log, exp, relu) build a DAG; backward() does reverse-mode differentiation via topological sort. Used for all model parameters and activations so gradients flow through the full graph.

Parameters (lines 147–182)

nEmbd (16) – Embedding dimension.
nHead (4) – Attention heads.
nLayer (1) – Transformer blocks.
blockSize (8) – Max context length.
matrix(nout, nin, std) – Initializes nout × nin matrix of Value objects with Gaussian(0, std).

State dict:

wte – Token embeddings (vocab × nEmbd).
wpe – Position embeddings (blockSize × nEmbd).
layer{i}.attn_wq/wk/wv/wo – Attention Q, K, V, O projections.
layer{i}.mlp_fc1/fc2 – MLP (4× expand, ReLU² activation).
lm_head – Output projection to vocab logits.

Heads and MLP output layers use std=0 to start from identity-like behavior.

Model (lines 184–242)

linear(x, w) – x vector × w matrix; returns w @ x.
softmax(logits) – Numerically stable softmax (subtract max before exp).
rmsnorm(x) – RMS normalization: scale by (mean(x²) + ε)^(-0.5).
gpt(tokenId, posId, keys, values) – Forward pass:
1. Embed: tok_emb + pos_emb, then RMSNorm.
2. Per layer:
  - Attention: RMSNorm → Q,K,V projections → per-head scaled dot-product attention over cached K,V → O projection → residual.
  - MLP: RMSNorm → fc1 → ReLU² → fc2 → residual.
3. Final projection to vocab logits.

Uses causal attention: at position t, only positions 0..t can attend (via keys/values built step-wise).

Adam (lines 244–250)

First and second moment buffers (m, v). Bias correction with β1^(step+1) and β2^(step+1). Learning rate decays linearly to 0 over training.

Training Loop (lines 252–289)

Each step: pick a doc, tokenize as [BOS, ...chars, BOS], take up to blockSize positions. For each position, run GPT forward, get logits → softmax → cross-entropy loss on target. Average loss over positions, call backward(), then Adam update. Zero grads after each step.

Evaluation (lines 291–314)

On up to 500 validation docs: forward only (no backward), compute cross-entropy, argmax accuracy, and perplexity exp(loss).

Inference (lines 316–334)

For each sample: start with BOS, autoregressively run GPT, softmax with temperature scaling, sample next token via weightedChoice, decode and print until BOS or blockSize. Temperature 0.6 softens the distribution for more varied outputs.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
gpt.js		gpt.js
input.txt		input.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MicroLLM

What It Does

Requirements

Usage

Code Structure

Utilities (lines 8–39)

Data (lines 41–48)

Tokenizer (lines 50–56)

Value (Autograd) (lines 58–145)

Parameters (lines 147–182)

Model (lines 184–242)

Adam (lines 244–250)

Training Loop (lines 252–289)

Evaluation (lines 291–314)

Inference (lines 316–334)

About

Uh oh!

Releases

Packages

Languages

allthingsllm/gptjs

Folders and files

Latest commit

History

Repository files navigation

MicroLLM

What It Does

Requirements

Usage

Code Structure

Utilities (lines 8–39)

Data (lines 41–48)

Tokenizer (lines 50–56)

Value (Autograd) (lines 58–145)

Parameters (lines 147–182)

Model (lines 184–242)

Adam (lines 244–250)

Training Loop (lines 252–289)

Evaluation (lines 291–314)

Inference (lines 316–334)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages