Browse By Category
Jump to a topic before you scan pages.
The catalog works better when categories do the heavy lifting. Start here, pick a concept area, then drill into the visualization list for that section.
AI Accelerators & CPU Architectures
Deep dive into hardware design for machine learning
- Instruction Set Architectures Comparing CPU instruction architectures and parallelism philosophies
- TPU v1 Architecture Interactive block diagram of the Google Tensor Processing Unit
- TPU v7 Deep Dive Interactive architecture overview of a speculative future TPU generation
- Systolic Array Visualization Visualize how data flows through a rhythmic matrix multiply unit
- Flynn's Taxonomy Interactive guide to SISD, SIMD, MISD, and MIMD machine models
- MIPS Pipeline Explorer Step through pipelined CPU execution and compare stage overlap
- Array vs. Vector vs. MIPS Pipeline Compare scalar, array, and vector execution models in one architecture view
- Array vs. Vector Processors Space-time mapping of data parallelism across different hardware architectures
- Scalar vs. SIMD Step-Through Follow how scalar and SIMD instructions use registers and lanes over time
- Vector Instruction Execution See how vector operations fan out across multiple hardware units
- GPU Multi-Core Scheduling Visualize fine-grained multithreading across GPU cores and warp issue slots
- Vector ILP & Pipeline Throughput Visualizing accumulation of vector chunks and instruction-level parallelism
- GPU Warp Basics (SIMT) How SPMD code maps to a warp of hardware threads sharing a Program Counter
- GPU Branch Divergence Simulator to measure throughput and utilization losses in divergent kernels
- CUDA Branch Divergence Hardware Masking Track active masks, program counters, and SIMT efficiency during divergence
- Dynamic Warp Formation Compare baseline SIMT scheduling with post-divergence warp regrouping
- Warp-Level Scheduling (FGMT) Explore warp issue order, latency hiding, and fine-grained multithreading
- Simultaneous Multithreading (SMT) Compare SMT pipeline sharing against warp-style multithreading
- GPU Scheduling Strategies Contrast scheduling abstractions and the hardware tradeoffs behind them
- NVIDIA SM Multithreading Visual overview of host memory, warp scheduling, and streaming multiprocessor structure
- CUDA Execution Architecture Explore the logical grid, block, thread, and shared-memory hierarchy
- CUDA Hardware & Memory Architecture Connect the CUDA programming model to SMs, registers, caches, and device memory
- CUDA Execution Flow Step through host-to-device copies, kernel launch, synchronization, and result handling
- Shared Memory Bank Conflicts Visualize stride patterns, broadcasts, and padding tricks for shared memory
- CUDA Memory Coalescing Simulate how thread access patterns collapse into memory transactions
- CUDA 2D Kernel Memory Mapping Map 2D thread indices onto matrix coordinates, guards, and memory order
- AoS vs. SoA Compare memory layouts and their impact on access locality and vectorization
- NVIDIA Tensor Cores Understand mixed-precision matrix math on dedicated tensor units
- INT8 Quantization & Calibration Visualizer Animate FP32 activations becoming integer bins while calibration range changes error and hardware efficiency
- CUDA Streams Visualizer See how asynchronous transfers and kernels overlap across multiple streams
- TPU v1 Programming Model Visualization of the CISC-like instruction set used to program AI accelerators
- Vector Unit Architecture On-chip vector processing for activation and pooling operations
- PCIe Interface Details How accelerators communicate with the host CPU at high bandwidth
- CPU vs GPU vs TPU Architecture Visualizer Compare architecture fit for branchy code, dense matmul, CNNs, and recommender workloads
- CUDA Parallel Gradient Reduction Visualize thread blocks, reductions, and memory access patterns for gradient aggregation
Classical ML & Clustering
Traditional machine learning algorithms, metrics, and clustering methods
- Classification Metrics Explorer Explore accuracy, precision, recall, F1-score with interactive threshold adjustment
- Decision Tree Visualizer Visualize decision tree splitting and classification
- K-Nearest Neighbors (KNN) Interactive KNN algorithm visualization
- Logistic Regression Understand logistic regression and decision boundaries
- SVM Classification Support Vector Machine decision boundaries and margins
- Neural Network Classification Visualize neural network learning and decision boundaries
- K-Means Clustering Interactive K-means clustering algorithm
- Hierarchical Clustering Dendrograms and hierarchical clustering methods
- DBSCAN Clustering Visualizer Explore density-based clustering, core points, border points, noise, epsilon, and minPts
- Naive Bayes Classifier Visualizer Step through class priors, likelihoods, and posterior decisions for Naive Bayes
CNN Architectures
Convolutional neural networks and deep learning
- Convolution Basics Understand convolution operations fundamentally
- Convolution Layer Tensor Computation Tensor computations in convolutional layers
- Convolution Output Formula Calculate output dimensions from convolutions
- 3ร3 Convolution Engine Stepper Step through each MAC operation with synced input, filter, and psum updates
- neuFlow 3ร3 Convolution Engine Visualize PE broadcast, delay elements (WโK), and output map generation
- Pooling Operations Max pooling and average pooling visualizations
- 1ร1 Convolution Bottleneck Dimensionality reduction with 1ร1 convolutions
- Bottleneck Block Variant ResNet bottleneck block architecture
- ResNet Architecture Residual networks and skip connections
- Two-Stage Filtering Multi-stage convolution filtering
- CNN Padding & Stride Visualizer Adjust padding, stride, and kernel size to see convolution output geometry change
- 2D Convolution on Real Images Apply kernels to real image patches and inspect filtered outputs interactively
- Multi-Channel CNN vs 3D CNN Compare how 2D multi-channel and 3D convolutions process spatial and channel dimensions
ML Fundamentals
Core concepts and building blocks
- Softmax Function Understand the softmax activation function
- Internal Covariate Shift Intuition Why batch normalization matters
- Internal Covariate Shift & BatchNorm See how normalization smooths the optimization landscape
- Backpropagation & Chain Rule Trace forward values and backward gradients through a simple computation graph
- Knowledge Distillation Visualizer Compare teacher and student logits, soft targets, temperature, and deployment tradeoffs
- Weight Initialization Visualizer Explore how initialization affects activation variance, gradients, and quantization readiness
- Activation & Loss Functions Visualizer Inspect activation curves, gradients, and loss behavior across classification and regression tasks
- Bias-Variance & Regularization Explorer Adjust model complexity and regularization to see bias, variance, and overfitting tradeoffs
- Random Forest vs Gradient Boosting Compare bagging and boosting behavior, tree ensembles, and error evolution
- Regression Model Zoo & Outlier Sensitivity Compare regression models and see how outliers affect fitted curves and errors
Optimization & Training Dynamics
Losses, optimizers, gradients, and training-time memory tradeoffs
- Gradient Descent Explorer Compare batch size, learning rate, convergence paths, and optimization stability
- Loss Functions Explorer Step through chain-rule gradients and compare loss surfaces interactively
- Gradient Descent Variations Visualizer Compare SGD, momentum, RMSProp, Adam, and related optimizer dynamics
- Loss Landscape: Why Skip Connections Matter Explore how residual paths affect optimization landscapes and gradient flow
NLP & Language Models
Text representations, sequence models, attention, and modern language-model foundations
- Bag of Words Spam Detection Compare binary and count-based BoW features as emails turn into linear spam scores
- TF-IDF Explorer Step through term frequency, document frequency, log IDF, and corpus edits in one live workspace
- N-Grams & Markov Assumption See how context windows approximate the full chain rule and where sparsity breaks long histories
- Word Embeddings Explore semantic vector spaces, analogies, and projected clusters in an interactive 3D view
- Cosine Similarity Explorer Visualize angle-based similarity for sparse word counts and dense embedding dimensions
- Embedding Full Sentences Compare full-sentence vectors and inspect why semantically similar sentences align in embedding space
- Why Pretraining Matters Show how context, corpus scale, MLM, and domain adaptation shape modern embedding quality
- Transfer Learning Playground Contrast frozen versus fine-tuned representations for word-level and sentence-level sentiment tasks
- Recurrent Neural Network Trace shared weights through time and build intuition for exploding and vanishing gradients
- RNN Execution Modes Compare folded versus unfolded views, autoregressive feedback, and sequence outputs across domains
- LSTM Cell: Inside the Black Box Adjust dimensions, inputs, and gate weights to watch cell-state updates propagate live
- Seq2Seq LSTM Encoder-Decoder Step through translation with stacked LSTMs, inference versus training, and teacher forcing
- Self-Attention Mechanism Visualize query, key, and value interactions and how attention scores are computed live
- Multi-Head Attention Understand how multiple attention heads capture different semantic relationships in parallel
- Masked Self-Attention Explore how causal masking prevents looking at future tokens during autoregressive generation
- Seq2Seq with Cross Attention Follow alignment, softmax weights, context vectors, and decoder updates with attention heatmaps
- Positional Encoding Visualize how sine and cosine functions inject sequence order into non-recurrent architectures
- Linear Transformation in PE Visualize how positional information is mixed through linear layers
- Rotary Positional Embedding (RoPE) Interact with rotary embeddings and see how relative positions are encoded via complex rotations
- Add & Norm (Residuals) Understand residual connections and layer normalization that stabilize deep transformer training
- KV Cache Visualization See how Key-Value caching accelerates inference by avoiding redundant computations of past tokens
- Prefill vs Decode Compare prompt prefill, token-by-token decode, arithmetic intensity, and KV cache growth
- FlashAttention Math Step through online softmax updates that avoid materializing the full attention matrix
- FlashAttention Loop Schedule Visualize block-wise Q, K, V, and output movement between HBM and SRAM
- FlashAttention Traffic Visualizer Inspect tiled attention data movement and memory traffic across compute phases
- PagedAttention See how virtualized KV blocks reduce fragmentation during LLM serving
- PagedAttention & Continuous Batching Follow requests as token slots, page tables, and GPU batches evolve during serving
- Transformer Parallelism Compare training and inference dataflows and how transformers achieve massive parallel scale
- Transformer vs RNN Parallelism Deeper look into where parallelism occurs in modern architectures
- BERT Explorer Interact with bidirectional self-attention, input construction, pretraining tasks, and model variants
- Masked Attention Matrix Walkthrough Hover through QK scores, causal masks, softmax weights, and token-level labels
- Perplexity Explorer Connect token probabilities, cross-entropy, and perplexity with editable language-model examples
- Greedy, Beam, and Temperature Decoding Compare deterministic, beam-search, and temperature-controlled decoding paths
- Top-k and Top-p Sampling Interact with vocabulary truncation, nucleus thresholds, and sampled token outcomes
- Sampling Distribution Controls Adjust temperature, repetition penalty, top-k, and top-p to reshape a token distribution
- Contrastive Decoding Sandbox Balance expert and amateur model probabilities to penalize degenerate generations
- Speculative Decoding Sandbox See how a draft model proposes tokens and a larger model verifies them in parallel
- RLHF Pipeline Visualizer Trace supervised fine-tuning, preference modeling, rewards, and policy optimization
- History of Language Models Walk the timeline from N-grams and BoW through RNNs, attention, BERT, and GPT-style generation
- Tokenizer & BPE Visualizer Step through byte-pair encoding merges and see how text becomes model tokens
- Sparse & Local Attention Patterns Compare full, sparse, and local attention masks and their compute tradeoffs
- Interactive Positional Encoding Explore sinusoidal encodings and how positions map into vector dimensions
- DeepSeek Multi-Head Latent Attention Compare full KV caching with latent caching, RoPE slices, and sparse attention reads
- LLM Decoding Strategies Explorer Compare greedy, temperature, top-k, top-p, beam, contrastive, and speculative decoding
- Text Generation Decoding Strategies Visualize how decoding controls reshape next-token choices and sequence diversity
- Prefill vs Decode Visualizer Contrast prompt prefill, token-by-token decode, KV cache growth, and arithmetic intensity
- Online Softmax Visualizer Step through streaming softmax updates used by memory-efficient attention kernels
- Mixture of Experts Routing Visualizer Watch tokens route through sparse experts and compare load-balancing behavior
- Mixture of Experts Visualizer Explore expert routing, capacity, and sparse activation in MoE models
- Mixture of Depths vs MoE Compare token-skipping depth allocation against expert routing tradeoffs
- RLHF Pipeline with InstructGPT Examples Trace instruction tuning, preference data, reward modeling, and policy optimization
- Activation Checkpointing Visualizer See the memory-compute tradeoff from recomputing activations during backpropagation
- FlashAttention Visualizer Compare standard attention with tiled IO-aware attention execution
- FlashAttention Visual + Math Connect memory traffic diagrams with online softmax math and HBM/SRAM tiling
- GRPO Visualizer Explore group-relative policy optimization and reward normalization for LLM training
- BERT vs GPT Attention Visualizer Compare bidirectional and causal attention patterns for encoder and decoder models
- GQA & MQA Visualization Compare grouped-query and multi-query attention KV sharing and memory cost
- Multi-Head Attention Interactive Visualizer Inspect multiple attention heads and how projections combine into contextual outputs
- Transformer Training Memory Visualization Break down activation, parameter, optimizer, and gradient memory during training
- LoRA Visualizer Visualize low-rank adaptation matrices and how they update frozen model weights
Quantization
Numeric formats, scale/zero-point math, and quantized training/inference behavior
- Quantization Basics Explore bit depth, scale, zero-point, clipping, rounding, and reconstruction error
- Floating-Point Format Explorer Compare FP32, BF16, FP16, and FP8 layouts, ranges, precision, and special values
- Quantization-Aware Training Inspect fake quantization, straight-through estimators, calibration, and granularity choices
- QAT Training Loop Step through forward quantization noise, backward gradients, and post-training deployment
- Number Formats Visualizer Compare numeric formats, ranges, precision, and quantization behavior
- Weight Quantization Visualizer Inspect per-tensor and per-channel weight quantization and reconstruction error
- LLM Pruning & Low-Resource Hardware Explore pruning as a deployment strategy for large models on constrained hardware
GEMM Optimization
General matrix multiply optimization techniques
- Strassen's Matrix Multiplication Algorithm for O(n^2.81) matrix multiplication with recursive decomposition
- Matrix Multiplication Loop Ordering Basic matrix multiplication implementation
- GEMM Tiled Implementation Cache-optimized tiled matrix multiplication
- GEMM Tiling Visualizer Interactive tiling strategy visualization
- Image to Column (im2col) Converting convolutions to GEMM for high-performance execution
- Cache Cliff and Tiling Understanding cache cliffs and how tiling helps
- Matrix Multiplication: Non-Tiled Misses Mathematical derivation of cache misses
- Tiled Cache Miss Formula Derivation for tiled implementations
Kernel Optimization
Data reuse and computational efficiency
Cache & Memory Optimization
Memory hierarchy and cache performance
Dataflow Patterns
Hardware execution patterns for neural networks
- Eyeriss Row-Stationary Dataflow Walk through row-stationary execution and local reuse in Eyeriss
- Eyeriss Buffet Step through command-and-address orchestration in the Eyeriss buffet network
- Input Stationary Dataflow Input-stationary accelerator dataflow pattern
- Output Stationary Dataflow Output-stationary dataflow and accumulation behavior
- Output-Stationary Accelerator (8 PE Parallel) Parallel output-stationary accumulation across 8 processing elements
- Weight Stationary Dataflow Weight-stationary architecture patterns
- Weight-Stationary Accelerator (8 PE Parallel) Step-by-step visualization of 8 processing elements working in parallel
- Simplified NVDLA Convolution Dataflow Atomic-C parallel convolution walkthrough with receptive-field and output-cell tracking
Recommender Systems
Sparse+dense recommendation models, embedding tables, and serving bottlenecks
Distributed & Federated Learning
Communication patterns, decentralized training, and privacy-preserving learning
Generative Models & Evaluation
Latent-variable generation and evaluation metrics for synthetic data
Performance Modeling
Analyzing and predicting system performance