Token IDs
Click Step or Play to start
Pair Frequencies
-
Merge Table
No merges yet

Phase

-

Merges Done

0

Token Count

-

Compression

-
Click "Step" or "Play" to start the visualization
Best Pair
Merged Token
Pair Highlight
Token count -
Tokens will appear here after processing
Token IDs will appear here

Python Code

# --- TRAIN ---
ids = list(text.encode("utf-8"))
for i in range(n_merges):
    stats = get_pair_stats(ids)
    best = max(stats, key=stats.get)
    idx = 256 + i
    ids = merge_pair(ids, best, idx)
    merges[best] = idx
# --- ENCODE ---
ids = list(text.encode("utf-8"))
while len(ids) >= 2:
    pair = min(stats, key=merges.get)
    if pair not in merges: break
    ids = merge_pair(ids, pair, merges[pair])