Accel-Sim and the Evolution of GPU Performance Modeling

Accel-Sim framework and research built upon it: from the original extensible simulation framework to modern core modeling, power analysis, and applications across GPU architecture research.

GPU performance simulation is critical for architecture research, but validating simulators against rapidly evolving hardware remains challenging. Accel-Sim, introduced at ISCA 2020, established an extensible framework with SASS trace-driven simulation and systematic validation against real hardware. This post surveys the Accel-Sim ecosystem: the core framework papers, enhancements to the simulator, and top-tier research that uses or compares against Accel-Sim.

Core Accel-Sim Framework

Foundation: Validated GPU Modeling

Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling (ISCA 2020) The foundational paper introducing Accel-Sim. Presents a SASS trace-driven frontend that captures actual GPU assembly executed on hardware, coupled with a cycle-level performance model based on GPGPU-Sim 4.x. The key innovation is systematic validation: Accel-Sim reports correlation with real NVIDIA GPU hardware across diverse workloads. Provides an extensible infrastructure for GPU architecture research with validated baseline models.

Power Modeling

AccelWattch: A Power Modeling Framework for Modern GPUs (MICRO 2021) Integrates cycle-level power modeling into Accel-Sim/GPGPU-Sim. AccelWattch models power consumption across GPU components (cores, memory subsystem, interconnect) with validation against real hardware measurements. Enables simultaneous performance and power analysis for GPU architecture studies, essential for evaluating energy efficiency of proposed optimizations.

Graphics and Compute Co-execution

CRISP: Concurrent Rendering and Compute Simulation Platform for GPUs (IISWC 2024) Extends Accel-Sim to simulate graphics workloads (Vulkan API) and graphics+compute co-execution scenarios. Addresses the gap in GPU simulation tools that traditionally focus solely on GPGPU compute. Enables research on resource partitioning and interference between rendering and compute on modern unified GPU architectures.

Modern GPU Core Architecture

Dissecting and Modeling the Architecture of Modern GPU Cores (MICRO 2025) Reverse-engineers modern NVIDIA streaming multiprocessor (SM) cores from Turing through Blackwell architectures. Redesigns Accel-Sim’s core model based on discovered microarchitectural details: control bit semantics, scheduler policies, register file organization, and cache microarchitecture. Reports substantial accuracy improvements—MAPE of ~17.4% for Blackwell. This represents the definitive update to Accel-Sim’s core model for modern GPUs, addressing the 15+ year gap between academic simulator models and contemporary hardware.

Framework Enhancements

Beyond the core papers, several works extend Accel-Sim’s capabilities:

Analyzing and Improving Hardware Modeling of Accel-Sim (arXiv 2024) Identifies and fixes modeling issues in Accel-Sim’s front-end, result bus, and memory pipeline. Provides detailed analysis of discrepancies between simulated and actual hardware behavior, with proposed fixes that improve accuracy.
Integrating Per-Stream Stat Tracking into Accel-Sim (arXiv 2023) Adds per-stream statistics tracking to Accel-Sim/GPGPU-Sim. Enables fine-grained analysis of multi-stream workloads, important for understanding concurrent kernel execution and stream-level interference.
Parallelizing a Modern GPU Simulator (arXiv 2025) Implements OpenMP-based parallelization for Accel-Sim with deterministic results. Achieves 5.8× average speedup with 16 threads, addressing simulation time bottlenecks that limit large-scale studies.

Research Using Accel-Sim

Methodology and Validation

Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads (MICRO 2021) Develops methodology for simulating scaled GPU workloads and validates using Accel-Sim models of V100. Demonstrates techniques for extrapolating performance of larger-than-simulated configurations, addressing the challenge that full-system simulation of future GPUs is often impractical.
GPU Scale-Model Simulation (HPCA 2024) Uses Accel-Sim to generate per-benchmark miss-rate curves for cache subsystem studies. Discusses Accel-Sim’s simulation speed characteristics and validates scale-model simulation methodology against Accel-Sim’s detailed models.

Architectural Proposals

Several ISCA and MICRO papers use Accel-Sim to evaluate novel GPU architectural features:

GCStack + GCScaler: Fast and Accurate GPU Performance Analyses Using Fine-Grained Stall Cycle Accounting and Interval Analysis (ISCA 2025) Implements GCStack methodology on Accel-Sim for fine-grained performance analysis. Leverages Accel-Sim’s cycle-accurate modeling to attribute performance to specific microarchitectural bottlenecks.
Avant-Garde: Empowering GPUs with Scaled Numeric Formats (ISCA 2025) Models Avant-Garde’s FP8 and scaled numeric format support using Accel-Sim. Extends AccelWattch power model to account for reduced-precision arithmetic units. Demonstrates how Accel-Sim’s extensibility enables evaluation of novel datapath modifications.
LATPC: Accelerating GPU Address Translation Using L1 Access-Time Prefetching of Cache Hits (MICRO 2025) Evaluates address translation optimization using Accel-Sim. Models TLB and page table walker modifications to demonstrate performance benefits of prefetching-based address translation.

Alternative Simulation Approaches

Several papers compare novel simulation methodologies against Accel-Sim as a baseline:

HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs (MICRO 2024) Proposes hybrid-fidelity simulation approach and compares accuracy, speed, and storage requirements against Accel-Sim. Demonstrates trade-offs between simulation fidelity and runtime, using Accel-Sim as the high-fidelity baseline.
Photon: A Fine-Grained Sampled Simulation Methodology for GPU Workloads (MICRO 2023) Develops sampled simulation method and positions it relative to cycle-accurate simulators like Accel-Sim. Shows how sampling techniques can achieve orders of magnitude speedup while maintaining acceptable accuracy for certain studies.
Swift and Trustworthy Large-Scale GPU Simulation with Fine-Grained Error Modeling and Hierarchical Clustering (MICRO 2025) Presents large-scale simulation methodology with error modeling. Compares against Accel-Sim among other simulators to demonstrate scalability and accuracy trade-offs for cluster-scale GPU system studies.
PyTorchSim: A Comprehensive, Fast, and Accurate NPU Simulator (MICRO 2025) NPU-focused simulator that references Accel-Sim as the prevalent GPU simulator in related work. Demonstrates how Accel-Sim’s design principles (trace-driven simulation, validation methodology) influence simulators for other accelerator architectures.