🚀
AI Accelerators & CPU Architectures
Deep dive into hardware design for machine learning
- CISC vs RISC vs VLIW Comparing CPU instruction architectures and parallelism philosophies
- TPU v1 Architecture Interactive block diagram of the Google Tensor Processing Unit
- TPU v7 Deep Dive Interactive architecture overview of a speculative future TPU generation
- Systolic Array Core Visualize how data flows through a rhythmic matrix multiply unit
- Array vs. Vector Processors Space-time mapping of data parallelism across different hardware architectures
- Vector ILP & Pipeline Throughput Visualizing accumulation of vector chunks and instruction-level parallelism
- GPU Warp Basics (SIMT) How SPMD code maps to a warp of hardware threads sharing a Program Counter
- Branch Divergence Profiler Simulator to measure throughput and utilization losses in divergent kernels
- TPU Programming Model Visualization of the CISC-like instruction set used to program AI accelerators
- Vector Unit Architecture On-chip vector processing for activation and pooling operations
- PCIe Interface Details How accelerators communicate with the host CPU at high bandwidth
📊
Classical ML & Classification
Traditional machine learning algorithms and metrics
- Classification Metrics Explorer Explore accuracy, precision, recall, F1-score with interactive threshold adjustment
- Decision Tree Visualizer Visualize decision tree splitting and classification
- K-Nearest Neighbors (KNN) Interactive KNN algorithm visualization
- Logistic Regression Understand logistic regression and decision boundaries
- SVM Classification Support Vector Machine decision boundaries and margins
- Neural Network Classification Visualize neural network learning and decision boundaries
- K-Means Clustering Interactive K-means clustering algorithm
- Hierarchical Clustering Dendrograms and hierarchical clustering methods
🧩
CNN Architectures
Convolutional neural networks and deep learning
- Convolution Basics Understand convolution operations fundamentally
- Convolution Layer Tensor Computation Tensor computations in convolutional layers
- Convolution Output Formula Calculate output dimensions from convolutions
- 3×3 Convolution Engine Stepper Step through each MAC operation with synced input, filter, and psum updates
- 3×3 2D Convolution Engine (neuFlow) Visualize PE broadcast, delay elements (W−K), and output map generation
- Pooling Operations Max pooling and average pooling visualizations
- 1×1 Convolution Bottleneck Dimensionality reduction with 1×1 convolutions
- Bottleneck Block Variant ResNet bottleneck block architecture
- ResNet Architecture Residual networks and skip connections
- Two-Stage Filtering Multi-stage convolution filtering
📚
ML Fundamentals
Core concepts and building blocks
âš¡
GEMM Optimization
General matrix multiply optimization techniques
- Strassen's Matrix Multiplication Algorithm for O(n^2.81) matrix multiplication with recursive decomposition
- GEMM Nested Loops Baseline Basic matrix multiplication implementation
- GEMM Tiled Implementation Cache-optimized tiled matrix multiplication
- GEMM Tiling Visualizer Interactive tiling strategy visualization
- img2col Transformation Converting convolutions to GEMM for high-performance execution
- Cache Cliff and Tiling Understanding cache cliffs and how tiling helps
- Non-Tiled Cache Miss Analysis Mathematical derivation of cache misses
- Tiled Cache Miss Formula Derivation for tiled implementations
🔥
Kernel Optimization
Data reuse and computational efficiency
💾
Cache & Memory Optimization
Memory hierarchy and cache performance
🔄
Dataflow Patterns
Hardware execution patterns for neural networks
- Input Stationary Dataflow Input-stationary accelerator dataflow pattern
- Output Stationary Dataflow Output-stationary dataflow and accumulation behavior
- Weight Stationary Dataflow Weight-stationary architecture patterns
- Weight-Stationary Accelerator (8 PE Parallel) Step-by-step visualization of 8 processing elements working in parallel
- Simplified NVDLA Convolution Dataflow Atomic-C parallel convolution walkthrough with receptive-field and output-cell tracking
📈
Performance Modeling
Analyzing and predicting system performance