PC Sampling in CPU Systems: A Comprehensive Survey

Tracing the evolution of PC sampling from early profiling techniques in the 1980s to modern continuous profiling systems: hardware innovations, compiler optimizations, and large-scale deployments.

Program Counter (PC) sampling has evolved from a niche debugging aid in the 1980s to an indispensable technique for performance profiling in modern computing systems. This survey traces the development of PC sampling research across four decades, highlighting key innovations in hardware support, low-overhead profiling, and large-scale deployments that have made always-on profiling a reality in production systems.

Early Profiling Techniques (1980s–Early 90s)

Gprof (1982)

Quartz (1990)

Dynamic Instrumentation vs. Sampling (Early 90s)

Emergence of Low-Overhead Sampling (Mid-1990s)

Hardware Counter Profiling (1996)

Digital Continuous Profiling Infrastructure (DCPI, 1997)

Morph (OS Support for Profiling, 1997)

Hardware Instruction Sampling Innovations (Late 1990s)

ProfileMe (1997)

Sampling-Based Profiling in HPC (2000s)

HPCToolkit and Call-Path Profiling (2005)

Memory Profiling Using Hardware Counters (2003)

Scaling to Petascale (2009)

Profiling in Managed Runtimes and Compilers (2000s–2010s)

Java and Managed Languages

Compiler Optimizations with Sampled Profiles

Continuous Profiling at Scale (2010s)

Google-Wide Profiling (2010)

Recent Developments

Research continues to refine CPU sampling. Recent papers explore memory-access sampling (to profile data locality) and context-sensitive sampling (e.g., differentiating samples by execution context or input). Recent analyses examine the precision and overheads of Intel’s PEBS and AMD’s IBS for memory profiling (Yi et al. 2020; Sasongko 2022), and SOSP’23 Memtis demonstrates PEBS-based memory-access sampling at scale.

Meanwhile, the rise of GPUs led to PC sampling being introduced in GPU drivers (NVIDIA’s CUPTI and AMD’s ROCm now offer PC sampling for kernels). This mirrors the CPU history: initial GPU profilers relied on instrumentation or simulation, and only more recently has statistical sampling been deployed to low-overhead profile GPU instruction stalls and bottlenecks.

In summary, decades of research – from gprof to present – have established PC sampling as an indispensable technique for performance profiling, tracing a line through top conferences in systems, architecture, PL, HPC, and compilers. Each generation of work built on the last, continually reducing overhead and improving fidelity, to make profiling an everyday, everywhere capability in modern computing systems.