Calling Context Trees: Concepts, Challenges, and Tools

A structured guide to CCT research: foundational concepts, scalability solutions (encoding and approximation), and modern visualization and analysis tools.

Calling Context Trees (CCTs) capture “who called whom” and assign cost metrics (time, energy, memory, cache events) to precise execution contexts. This post organizes CCT research as a problem-solution hierarchy: starting with core concepts, addressing scalability challenges, and exploring analysis tools. Each section highlights representative papers with their key innovations and available artifacts.

Core Concepts

What is a CCT?

Ammons, Ball, Larus. PLDI 1997 The foundational CCT data structure definition and how to build it using hardware performance counters. Shows how to extend path profiling to flow-sensitive profiling by associating performance metrics with paths through procedures, creating the tree structure where each node represents a unique calling context.

How to Build CCTs

Zhuang et al. PLDI 2006 How to build CCTs incrementally at runtime with adaptive overhead control. Introduces the “adaptive bursting” technique that samples stack frames only when needed, demonstrating how to construct accurate CCTs while keeping runtime overhead bounded (typically <3%).
Zhao, Liu, Chabbi. SC 2020 (DrCCTProf) How to build full CCTs for every instruction in binaries (not just function calls) using dynamic binary instrumentation. Shows CCT construction for ARM/x86 architectures with DynamoRIO, enabling fine-grained attribution of hardware events to calling contexts. Artifact: GitHub
Zhou et al. ICS 2022 How to extend CCT construction to GPU-accelerated applications with low overhead. Demonstrates context-sensitive profiling for GPU kernels and CUDA operations using call-path memoization (CPU side) and adaptive epoch profiling with postmortem reconstruction (GPU side), attributing performance metrics to both CPU calling contexts and GPU execution contexts in heterogeneous applications.

Scalability Challenges and Solutions

Space Challenge: Encoding

Sumner et al. ICSE 2010 / TSE 2012 (PCCE) How to encode calling contexts as compact integer IDs instead of storing full CCT nodes. Each context gets a unique integer through simple additions during execution; contexts can be reconstructed in constant time. Solves the problem of expensive stack walking by encoding contexts on-the-fly with minimal overhead.
Zeng et al. CGO 2014 (DeltaPath) How to encode CCT contexts in object-oriented languages where dynamic class loading and polymorphism complicate context tracking. Introduces the “delta” concept: encoding differences between contexts rather than full paths, making it scalable for large Java applications with complex inheritance hierarchies.
Li et al. CGO 2014 (DACCE) How to handle CCT encoding in multithreaded programs where contexts are created and destroyed dynamically across threads. Shows adaptive encoding/decoding that adjusts to changing program behavior, handling indirect calls and concurrent execution without requiring global synchronization.
Zhou, Jantz, Kulkarni, Doshi, Sarkar. CC 2019 (Valence) How to compress CCT encodings using variable-length schemes instead of fixed-size integers. Leverages static analysis (LLVM) to assign shorter codes to frequent calling contexts, reducing encoding size by ~60% compared to PCCE while maintaining lossless reconstruction. Artifact: GitLab
Kim et al. CC 2025 How to achieve deterministic calling context encoding using running call path counts from static call graphs. Avoids the collision detection overhead of PCCE and PCC while maintaining precise context distinction, achieving up to 2.1× speedup on parallel benchmarks by eliminating expensive synchronization for collision checking.

Overhead Challenge: Approximation

Arnold & Sweeney. IBM TR 2000 How to approximate full CCTs through periodic stack sampling without instrumenting every function call. Demonstrates that you can reconstruct an approximate CCT from sampled stack snapshots, achieving <1% overhead while capturing the “shape” of the CCT with sufficient accuracy for optimization in production JVMs.
Mytkowicz et al. OOPSLA 2009 (Inferred Call Path) How to infer calling contexts from minimal runtime data (just stack depth + program counter) without building the full CCT during execution. Shows that call paths can be reconstructed offline by combining static program structure with dynamic stack height, avoiding expensive stack walking entirely.
D’Elia et al. PLDI 2011 (Hot CCT) How to build a sparse CCT containing only “hot” (frequently executed) calling contexts using streaming algorithms. Applies frequent-item mining to maintain just the top-K contexts in bounded memory (typically <1MB), discarding cold paths while preserving the most important parts of the CCT for performance analysis.
Ausiello et al. OOPSLA 2012 (k-Calling Context Forest) How to control CCT granularity by limiting context depth to k levels through the k-Calling Context Forest (k-CCF) data structure. While full CCTs associate metrics to complete paths from the root function, they provide no explicit information for detecting short hot sequences of activations and can grow prohibitively large. The k-CCF generalizes both edge profiling (k=1, caller-callee pairs) and full context-sensitive profiling (k=∞, complete paths) by associating performance metrics to paths of length at most k. Experiments on prominent Linux applications show that k-CCF provides effective space-accuracy tradeoffs: k=2 or k=3 often reveal hot spots hidden in full CCTs while using significantly less space, making these small values most interesting in practice.

Analysis and Tooling

Visualization

Moret et al. ICPE/WOSP 2010 (Calling Context Ring Charts) How to visualize large CCTs (millions of nodes) using radial space-filling layouts. Ring charts represent the CCT hierarchy as concentric circles: the root at center, children in surrounding rings. Nodes are sized by metrics (e.g., CPU time), making hot calling contexts immediately visible through color and size, with interactive zoom for deep tree exploration.
Isaacs et al. EuroVis 2014 STAR The fundamental challenges in visualizing performance data for HPC, including CCTs: extreme scale (billions of nodes), multi-dimensional metrics (time, energy, cache misses), and comparing profiles across runs or MPI ranks. Presents a state-of-the-art survey and taxonomy of performance visualization requirements, helping you understand what features matter when building or choosing CCT visualization tools.
Bhatele et al. SC 2019 (Hatchet) How to programmatically manipulate CCTs: pruning subtrees, filtering by metrics, aggregating across multiple CCTs, and computing differences between CCT profiles. Provides a Python API for CCT operations (query, filter, merge, diff) that works with multiple profiler formats (HPCToolkit, Caliper, TAU, timemory, pyinstrument), enabling automated CCT analysis workflows. Artifact: GitHub
Cankur & Bhatele. ISC 2022 How to evaluate profiling tools based on their call graph generation capabilities. Performs a systematic comparison of Caliper, HPCToolkit, TAU, and Score-P across runtime overheads, memory consumption, and output data quality using proxy applications (AMG, LULESH, Quicksilver) on parallel clusters. Demonstrates which tools produce the lowest overheads and most meaningful call graph data under different conditions, providing practical guidance for selecting profiling tools in HPC environments.

Applications

Jindal et al. OSDI 2018 (DiffProf) How to use CCTs for differential analysis by comparing calling context trees across similar applications. Shows CCT “slicing” techniques: extracting and matching corresponding subtrees from different apps’ CCTs to identify where energy consumption differs. Demonstrates applying CCT comparison to find optimization opportunities by learning from more efficient implementations in similar codebases.
Vukasovic & Prokopec. TOPLAS 2023 How to use partially context-sensitive profiles (limited-depth CCTs) for JIT compilation optimization. Shows that partial calling contexts (k=2 or k=3 levels) provide sufficient information for identifying hot code and guiding compilation decisions while maintaining low profiling overhead for production JVMs.
Milikic et al. CGO 2025 (GraalNN) How to use Graph Neural Networks to predict context-sensitive profiles from control-flow graphs. Demonstrates ML-based approaches to static CCT inference: learning from CFG structure to predict calling context profiles without runtime instrumentation, achieving 10.13% speedup over traditional static profiling methods.