Scalana: A (Not-So) Deep Dive into its Codebase

A detailed analysis of the Scalana codebase, its workflow, components, and data formats.

A (not-so) deep dive into the Scalana tool, its workflow, core components, and data formats, based on the SC20 paper and its accompanying source code.

Introduction

SCALANA is a performance analysis tool designed to automatically detect scalability bottlenecks in large-scale parallel programss. Traditional performance tools often force a choice between low-overhead profiling, which lacks the detail for root-cause analysis, and high-detail tracing, which can incur prohibitive overhead.

SCALANA’s core innovation is a hybrid approach that combines static analysis at compile-time with lightweight, sampling-based profiling at runtime. This allows it to construct a detailed Program Performance Graph (PPG) that captures both the program’s structure and its performance characteristics, enabling deep analysis at a fraction of the cost of full tracing. This post breaks down the tool’s workflow, key concepts, and data formats to provide a comprehensive understanding of how it works under the hood.

Dependencies

To compile and run the SCALANA artifact, the following dependencies are required, with version numbers based on the paper’s description.

1. C++ Toolchain

2. LLVM

3. Parallel and Performance Libraries

4. Analysis and Visualization

I. Workflow

The tool’s end-to-end process can be understood as a three-act play, moving from static code analysis to dynamic data collection and finally to post-mortem fusion and analysis.

Act I: Static Analysis (Compile-Time)

This phase corresponds to Section III-A of the paper and involves analyzing the code’s structure and instrumenting it.

Act II: Data Collection (Runtime)

This phase corresponds to Section III-B of the paper, where the instrumented program is run to collect performance data.

Act III: Graph Fusion and Analysis (Post-Mortem)

This phase corresponds to Sections III-C and IV of the paper and brings all the data together for the final analysis.

Step 3a: Symbol Resolution

Step 3b: PPG Construction

Step 3c: Scalability Analysis

II. Key Concepts Explained

The LLVM Pass: Scalana Static Engine

The “magic” behind Scalana’s static analysis is the LLVM Pass implemented in IRStruct.cpp. After a compiler front-end (like Clang) translates C++ or Fortran into the language-agnostic LLVM Intermediate Representation (IR), this pass operates directly on the IR. It performs two critical tasks:

  1. Analysis and Graphing: It traverses the program’s IR, identifying key structures like functions, loops, and calls. It uses this to build the complete Program Structure Graph (PSG) in memory before serializing it to out.txt.
  2. Transformation and Instrumentation: As it builds the graph, it simultaneously modifies the IR, precisely inserting calls to its runtime library functions (e.g., entryPoint) at the entry and exit points of these structures.

This powerful mechanism allows Scalana to access the program’s structure directly from the compiler’s perspective and reliably inject the necessary hooks for runtime tracing.

The PSG-to-PPG Pipeline: Map and Annotate

The entire workflow masterfully decouples static and dynamic analysis. The process can be understood with an analogy:

  1. Assigning “ID Cards” (PSG Creation): During static analysis, the LLVM pass acts like a census taker, walking through the “city” of the source code. It assigns a unique ID to every “building” (function, loop, etc.) and records this information on a master map (out.txt).
  2. Locating “Coordinates” (Sampling): At runtime, PAPI acts like a GPS tracker, periodically reporting the raw coordinates (instruction addresses) where the program is spending its time.
  3. Looking up the “Map” (PPG Fusion): The analyze program takes the GPS coordinates (SAMPLE.txt-symb), looks up the corresponding building on the master map (out.txt), and makes a mark on it.
  4. Creating the “Heatmap” (Final PPG): After all GPS reports are processed, the map is now annotated with marks indicating activity levels. This annotated map, showing which buildings were busiest, is the final Program Performance Graph (PPG) (stat.txt).

The PMPI Discrepancy

A deep analysis reveals a significant difference between the paper’s design and the provided codebase regarding communication analysis.

III. A (Not-So) Deep Dive into File Formats

Understanding the intermediate files is key to understanding the workflow.

1. PSG

2. Execution Trace

3. Symbolicated Sampling Data

4. PPG

IV. Summary

The table below summarizes the entire pipeline, connecting each phase to its core components and data artifacts.

Paper Phase Workflow Step Key File/Code Input (Source) Output (Destination)
PSG Construction (III-A) Static Analysis (Compile-time) IRStruct.cpp (irs.so) *.bc (Source), in.txt (Disk) out.txt (Disk), i{...} (Disk)
Runtime Data (III-B) Dynamic Analysis (Runtime) sampler.cpp (libsampler.so) i{...} (from Phase 1) LOG.txt & SAMPLE.txt (Disk)
PPG Construction (III-C) Post-mortem Processing parse.sh, log2stat.cpp SAMPLE.txt, LOG.txt (from Phase 2), out.txt (from Phase 1) stat.txt (Disk)
Scalability Analysis (IV) Bottleneck Detection & Backtracing Python Scripts, Java Viewer stat.txt (from Phase 3) Visualizations, Reports (Screen/Disk)