Browse By Category
Jump to a topic before you scan pages.
The catalog works better when categories do the heavy lifting. Start here, pick a concept area, then drill into the visualization list for that section.
AI Accelerators & CPU Architectures
Deep dive into hardware design for machine learning
- Instruction Set Architectures Comparing CPU instruction architectures and parallelism philosophies
- TPU v1 Architecture Interactive block diagram of the Google Tensor Processing Unit
- TPU v7 Deep Dive Interactive architecture overview of a speculative future TPU generation
- Systolic Array Visualization Visualize how data flows through a rhythmic matrix multiply unit
- Flynn's Taxonomy Interactive guide to SISD, SIMD, MISD, and MIMD machine models
- MIPS Pipeline Explorer Step through pipelined CPU execution and compare stage overlap
- Array vs. Vector vs. MIPS Pipeline Compare scalar, array, and vector execution models in one architecture view
- Array vs. Vector Processors Space-time mapping of data parallelism across different hardware architectures
- Scalar vs. SIMD Step-Through Follow how scalar and SIMD instructions use registers and lanes over time
- Vector Instruction Execution See how vector operations fan out across multiple hardware units
- GPU Multi-Core Scheduling Visualize fine-grained multithreading across GPU cores and warp issue slots
- Vector ILP & Pipeline Throughput Visualizing accumulation of vector chunks and instruction-level parallelism
- GPU Warp Basics (SIMT) How SPMD code maps to a warp of hardware threads sharing a Program Counter
- GPU Branch Divergence Simulator to measure throughput and utilization losses in divergent kernels
- CUDA Branch Divergence Hardware Masking Track active masks, program counters, and SIMT efficiency during divergence
- Dynamic Warp Formation Compare baseline SIMT scheduling with post-divergence warp regrouping
- Warp-Level Scheduling (FGMT) Explore warp issue order, latency hiding, and fine-grained multithreading
- Simultaneous Multithreading (SMT) Compare SMT pipeline sharing against warp-style multithreading
- GPU Scheduling Strategies Contrast scheduling abstractions and the hardware tradeoffs behind them
- NVIDIA SM Multithreading Visual overview of host memory, warp scheduling, and streaming multiprocessor structure
- CUDA Execution Architecture Explore the logical grid, block, thread, and shared-memory hierarchy
- CUDA Hardware & Memory Architecture Connect the CUDA programming model to SMs, registers, caches, and device memory
- CUDA Execution Flow Step through host-to-device copies, kernel launch, synchronization, and result handling
- Shared Memory Bank Conflicts Visualize stride patterns, broadcasts, and padding tricks for shared memory
- CUDA Memory Coalescing Simulate how thread access patterns collapse into memory transactions
- CUDA 2D Kernel Memory Mapping Map 2D thread indices onto matrix coordinates, guards, and memory order
- AoS vs. SoA Compare memory layouts and their impact on access locality and vectorization
- NVIDIA Tensor Cores Understand mixed-precision matrix math on dedicated tensor units
- CUDA Streams Visualizer See how asynchronous transfers and kernels overlap across multiple streams
- TPU v1 Programming Model Visualization of the CISC-like instruction set used to program AI accelerators
- Vector Unit Architecture On-chip vector processing for activation and pooling operations
- PCIe Interface Details How accelerators communicate with the host CPU at high bandwidth
Classical ML & Clustering
Traditional machine learning algorithms, metrics, and clustering methods
- Classification Metrics Explorer Explore accuracy, precision, recall, F1-score with interactive threshold adjustment
- Decision Tree Visualizer Visualize decision tree splitting and classification
- K-Nearest Neighbors (KNN) Interactive KNN algorithm visualization
- Logistic Regression Understand logistic regression and decision boundaries
- SVM Classification Support Vector Machine decision boundaries and margins
- Neural Network Classification Visualize neural network learning and decision boundaries
- K-Means Clustering Interactive K-means clustering algorithm
- Hierarchical Clustering Dendrograms and hierarchical clustering methods
CNN Architectures
Convolutional neural networks and deep learning
- Convolution Basics Understand convolution operations fundamentally
- Convolution Layer Tensor Computation Tensor computations in convolutional layers
- Convolution Output Formula Calculate output dimensions from convolutions
- 3×3 Convolution Engine Stepper Step through each MAC operation with synced input, filter, and psum updates
- neuFlow 3×3 Convolution Engine Visualize PE broadcast, delay elements (W−K), and output map generation
- Pooling Operations Max pooling and average pooling visualizations
- 1×1 Convolution Bottleneck Dimensionality reduction with 1×1 convolutions
- Bottleneck Block Variant ResNet bottleneck block architecture
- ResNet Architecture Residual networks and skip connections
- Two-Stage Filtering Multi-stage convolution filtering
ML Fundamentals
Core concepts and building blocks
- Softmax Function Understand the softmax activation function
- Internal Covariate Shift Intuition Why batch normalization matters
- Internal Covariate Shift & BatchNorm See how normalization smooths the optimization landscape
- Backpropagation & Chain Rule Trace forward values and backward gradients through a simple computation graph
NLP & Language Models
Text representations, sequence models, attention, and modern language-model foundations
- Bag of Words Spam Detection Compare binary and count-based BoW features as emails turn into linear spam scores
- TF-IDF Explorer Step through term frequency, document frequency, log IDF, and corpus edits in one live workspace
- N-Grams & Markov Assumption See how context windows approximate the full chain rule and where sparsity breaks long histories
- Word Embeddings Explore semantic vector spaces, analogies, and projected clusters in an interactive 3D view
- Cosine Similarity Explorer Visualize angle-based similarity for sparse word counts and dense embedding dimensions
- Embedding Full Sentences Compare full-sentence vectors and inspect why semantically similar sentences align in embedding space
- Why Pretraining Matters Show how context, corpus scale, MLM, and domain adaptation shape modern embedding quality
- Transfer Learning Playground Contrast frozen versus fine-tuned representations for word-level and sentence-level sentiment tasks
- Recurrent Neural Network Trace shared weights through time and build intuition for exploding and vanishing gradients
- RNN Execution Modes Compare folded versus unfolded views, autoregressive feedback, and sequence outputs across domains
- LSTM Cell: Inside the Black Box Adjust dimensions, inputs, and gate weights to watch cell-state updates propagate live
- Seq2Seq LSTM Encoder-Decoder Step through translation with stacked LSTMs, inference versus training, and teacher forcing
- Self-Attention Mechanism Visualize query, key, and value interactions and how attention scores are computed live
- Multi-Head Attention Understand how multiple attention heads capture different semantic relationships in parallel
- Masked Self-Attention Explore how causal masking prevents looking at future tokens during autoregressive generation
- Seq2Seq with Cross Attention Follow alignment, softmax weights, context vectors, and decoder updates with attention heatmaps
- Positional Encoding Visualize how sine and cosine functions inject sequence order into non-recurrent architectures
- Linear Transformation in PE Visualize how positional information is mixed through linear layers
- Rotary Positional Embedding (RoPE) Interact with rotary embeddings and see how relative positions are encoded via complex rotations
- Add & Norm (Residuals) Understand residual connections and layer normalization that stabilize deep transformer training
- KV Cache Visualization See how Key-Value caching accelerates inference by avoiding redundant computations of past tokens
- Prefill vs Decode Compare prompt prefill, token-by-token decode, arithmetic intensity, and KV cache growth
- FlashAttention Math Step through online softmax updates that avoid materializing the full attention matrix
- FlashAttention Loop Schedule Visualize block-wise Q, K, V, and output movement between HBM and SRAM
- FlashAttention Traffic Visualizer Inspect tiled attention data movement and memory traffic across compute phases
- PagedAttention See how virtualized KV blocks reduce fragmentation during LLM serving
- PagedAttention & Continuous Batching Follow requests as token slots, page tables, and GPU batches evolve during serving
- Transformer Parallelism Compare training and inference dataflows and how transformers achieve massive parallel scale
- Transformer vs RNN Parallelism Deeper look into where parallelism occurs in modern architectures
- BERT Explorer Interact with bidirectional self-attention, input construction, pretraining tasks, and model variants
- Masked Attention Matrix Walkthrough Hover through QK scores, causal masks, softmax weights, and token-level labels
- Perplexity Explorer Connect token probabilities, cross-entropy, and perplexity with editable language-model examples
- Greedy, Beam, and Temperature Decoding Compare deterministic, beam-search, and temperature-controlled decoding paths
- Top-k and Top-p Sampling Interact with vocabulary truncation, nucleus thresholds, and sampled token outcomes
- Sampling Distribution Controls Adjust temperature, repetition penalty, top-k, and top-p to reshape a token distribution
- Contrastive Decoding Sandbox Balance expert and amateur model probabilities to penalize degenerate generations
- Speculative Decoding Sandbox See how a draft model proposes tokens and a larger model verifies them in parallel
- RLHF Pipeline Visualizer Trace supervised fine-tuning, preference modeling, rewards, and policy optimization
- History of Language Models Walk the timeline from N-grams and BoW through RNNs, attention, BERT, and GPT-style generation
Quantization
Numeric formats, scale/zero-point math, and quantized training/inference behavior
- Quantization Basics Explore bit depth, scale, zero-point, clipping, rounding, and reconstruction error
- Floating-Point Format Explorer Compare FP32, BF16, FP16, and FP8 layouts, ranges, precision, and special values
- Quantization-Aware Training Inspect fake quantization, straight-through estimators, calibration, and granularity choices
- QAT Training Loop Step through forward quantization noise, backward gradients, and post-training deployment
GEMM Optimization
General matrix multiply optimization techniques
- Strassen's Matrix Multiplication Algorithm for O(n^2.81) matrix multiplication with recursive decomposition
- Matrix Multiplication Loop Ordering Basic matrix multiplication implementation
- GEMM Tiled Implementation Cache-optimized tiled matrix multiplication
- GEMM Tiling Visualizer Interactive tiling strategy visualization
- Image to Column (im2col) Converting convolutions to GEMM for high-performance execution
- Cache Cliff and Tiling Understanding cache cliffs and how tiling helps
- Matrix Multiplication: Non-Tiled Misses Mathematical derivation of cache misses
- Tiled Cache Miss Formula Derivation for tiled implementations
Kernel Optimization
Data reuse and computational efficiency
Cache & Memory Optimization
Memory hierarchy and cache performance
Dataflow Patterns
Hardware execution patterns for neural networks
- Eyeriss Row-Stationary Dataflow Walk through row-stationary execution and local reuse in Eyeriss
- Eyeriss Buffet Step through command-and-address orchestration in the Eyeriss buffet network
- Input Stationary Dataflow Input-stationary accelerator dataflow pattern
- Output Stationary Dataflow Output-stationary dataflow and accumulation behavior
- Output-Stationary Accelerator (8 PE Parallel) Parallel output-stationary accumulation across 8 processing elements
- Weight Stationary Dataflow Weight-stationary architecture patterns
- Weight-Stationary Accelerator (8 PE Parallel) Step-by-step visualization of 8 processing elements working in parallel
- Simplified NVDLA Convolution Dataflow Atomic-C parallel convolution walkthrough with receptive-field and output-cell tracking
Performance Modeling
Analyzing and predicting system performance