Skip to content

fix(walltime): backfill code mappings so multi-process benchmarks get flamegraphs#432

Open
not-matthias wants to merge 2 commits into
mainfrom
cod-3059-multi-threaded-benchmarks-dont-have-flamegraphs
Open

fix(walltime): backfill code mappings so multi-process benchmarks get flamegraphs#432
not-matthias wants to merge 2 commits into
mainfrom
cod-3059-multi-threaded-benchmarks-dont-have-flamegraphs

Conversation

@not-matthias

@not-matthias not-matthias commented Jul 3, 2026

Copy link
Copy Markdown
Member

Problem

Multi-process / multi-threaded benchmarks that exec() after sampling starts don't get flamegraphs. perf only emits MMAP2/COMM records while its event is enabled, so any benchmark process that exec()'d while sampling was disabled (every one after the first in a run) is missing its executable code mappings in the perf data. Without those load addresses, samples can't be symbolized and no flamegraph is produced.

Fix

Snapshot each benchmark pid's executable file mappings from /proc/<pid>/maps the first time it announces itself over the FIFO (while the process is still alive), then backfill any mappings perf missed before symbolizing.

Changes

  • refactor(walltime): extract add_module_mapping from process_mmap2_record so the perf MMAP2 path and the new /proc/<pid>/maps fallback share the same load-bias + symbol + unwind-data logic.
  • feat(walltime): new proc_maps module (ProcMapping, read_proc_maps, parse_proc_maps) that snapshots live process maps; FifoBenchmarkData carries maps_by_pid; the perf profiler backfills missing mappings before symbolization.

Testing

  • cargo build / cargo clippy clean.
  • Unit tests for parse_proc_maps (executable-only filtering, (deleted) stripping, paths with spaces) pass.
  • ⚠️ Not validated end-to-end: the backfill path itself has no automated coverage and no real multi-process perf run has confirmed the flamegraph now appears. Behavioral validation still pending.

perf only emits MMAP2/COMM while its event is enabled, so benchmark
processes that exec()'d while sampling was disabled (every one after the
first in a run) are missing their code mappings in the perf data, leaving
their samples unsymbolized and without flamegraphs.

Snapshot each benchmark pid's executable file mappings from
/proc/<pid>/maps the first time it announces itself over the FIFO (while
still alive), then backfill any mappings perf missed before symbolizing.
@not-matthias not-matthias force-pushed the cod-3059-multi-threaded-benchmarks-dont-have-flamegraphs branch from efcf122 to fdf45ba Compare July 3, 2026 17:37
@not-matthias not-matthias marked this pull request as ready for review July 3, 2026 17:37
@codspeed-hq

codspeed-hq Bot commented Jul 3, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 7 untouched benchmarks


Comparing cod-3059-multi-threaded-benchmarks-dont-have-flamegraphs (fdf45ba) with main (7ce2c98)

Open in CodSpeed

@greptile-apps

greptile-apps Bot commented Jul 3, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes missing flamegraphs for multi-process benchmarks by backfilling perf's MMAP2 records with executable mappings read from /proc/<pid>/maps while each benchmark process is still alive. The approach is sound: snapshot on first FIFO announcement, prefer perf's authoritative data, and degrade gracefully on any read error.

  • New proc_maps module (read_proc_maps / parse_proc_maps) snapshots live process mappings, filters to executable file-backed entries, and stores them in FifoBenchmarkData.maps_by_pid.
  • add_module_mapping extraction refactors the existing MMAP2 path into a shared helper used by both the perf record loop and the new /proc/<pid>/maps backfill in BenchmarkData::save_to.
  • Backfill loop in perf/mod.rs skips any (path, pid) pair already covered by perf, ensuring proc_maps data only fills genuine gaps.

Confidence Score: 4/5

The change is safe to merge; all new paths degrade gracefully on error and the backfill never overwrites perf's authoritative data.

The core logic is correct and well-tested. Two minor concerns: (deleted) file paths are retained after stripping the suffix, meaning from_elf may open a newer version of the library at that path (producing wrong symbols) or silently fail if the file is gone; and std::fs::read_to_string is called synchronously on the async executor. Neither causes crashes or data loss, but both could produce subtly incorrect symbolication in edge cases.

src/executor/shared/proc_maps.rs — the (deleted) path handling and synchronous I/O are worth a second look before merging.

Important Files Changed

Filename Overview
src/executor/shared/proc_maps.rs New module that reads and parses /proc//maps; well-structured with good unit tests. Minor concerns around including (deleted) file paths and synchronous I/O in async context.
src/executor/shared/fifo.rs Adds maps_by_pid to FifoBenchmarkData and snapshots /proc//maps on first CurrentBenchmark command per pid. Logic is clean and the duplicate-pid guard is correct.
src/executor/wall_time/profiler/perf/parse_perf_file.rs Refactors add_module_mapping out of process_mmap2_record cleanly; extraction is correct and the pub(super) visibility is appropriate. No new logic bugs introduced.
src/executor/wall_time/profiler/perf/mod.rs Backfill loop correctly checks already_mapped per (path, pid) before calling add_module_mapping, preferring perf's authoritative MMAP2 data over proc_maps snapshots.
src/executor/shared/mod.rs Trivial addition of proc_maps module export.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant B as Benchmark Process
    participant F as RunnerFifo
    participant PM as proc_maps
    participant P as perf profiler
    participant PP as parse_perf_file

    B->>F: "FifoCommand::CurrentBenchmark { pid }"
    F->>PM: read_proc_maps(pid)
    PM-->>F: "Vec<ProcMapping> (from /proc/pid/maps)"
    F->>F: maps_by_pid.insert(pid, mappings)

    Note over B,P: Sampling runs (perf records MMAP2 for first process only)

    B->>F: process exits
    F-->>P: "FifoBenchmarkData { maps_by_pid, bench_pids }"

    P->>PP: parse_for_memmap2(perf_file)
    PP-->>P: loaded_modules_by_path (may miss later pids)

    loop for each (pid, mappings) in maps_by_pid
        P->>P: check already_mapped per (path, pid)
        alt not already mapped by perf
            P->>PP: add_module_mapping(path, start, end, offset, pid)
            PP->>PP: compute_load_bias + from_elf + unwind_data
        end
    end

    P->>P: save_artifacts(loaded_modules_by_path)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant B as Benchmark Process
    participant F as RunnerFifo
    participant PM as proc_maps
    participant P as perf profiler
    participant PP as parse_perf_file

    B->>F: "FifoCommand::CurrentBenchmark { pid }"
    F->>PM: read_proc_maps(pid)
    PM-->>F: "Vec<ProcMapping> (from /proc/pid/maps)"
    F->>F: maps_by_pid.insert(pid, mappings)

    Note over B,P: Sampling runs (perf records MMAP2 for first process only)

    B->>F: process exits
    F-->>P: "FifoBenchmarkData { maps_by_pid, bench_pids }"

    P->>PP: parse_for_memmap2(perf_file)
    PP-->>P: loaded_modules_by_path (may miss later pids)

    loop for each (pid, mappings) in maps_by_pid
        P->>P: check already_mapped per (path, pid)
        alt not already mapped by perf
            P->>PP: add_module_mapping(path, start, end, offset, pid)
            PP->>PP: compute_load_bias + from_elf + unwind_data
        end
    end

    P->>P: save_artifacts(loaded_modules_by_path)
Loading

Reviews (1): Last reviewed commit: "feat(walltime): backfill benchmark code ..." | Re-trigger Greptile

if pathname.starts_with('[') {
continue;
}
let pathname = pathname.strip_suffix(" (deleted)").unwrap_or(pathname);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (deleted) mappings included with stripped path

After stripping (deleted), the bare path is passed to add_module_mapping, which calls ModuleSymbols::from_elf(record_path) and compute_load_bias. If the path no longer exists the call fails silently; if a newer version of the library is present at that path, from_elf opens the wrong ELF and the load bias / symbols computed for the process mapping will be from a different binary — producing incorrect or misleading symbolication.

The /proc/<pid>/maps (deleted) annotation means the directory entry has been unlinked; the process's mapping still refers to the old inode (only accessible via /proc/<pid>/map_files/<range>), not the current file at that path. Skipping such mappings is safer, since compute_load_bias cannot recover the correct data anyway.

Comment on lines +31 to +37
match std::fs::read_to_string(&path) {
Ok(content) => parse_proc_maps(&content),
Err(e) => {
warn!("Failed to read {path} for benchmark pid {pid}: {e}");
Vec::new()
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Blocking I/O on the async executor

std::fs::read_to_string performs a synchronous, blocking kernel call. read_proc_maps is invoked directly in the async handle_fifo_messages task without tokio::task::spawn_blocking. For /proc/<pid>/maps this is almost always negligible, but on a loaded system the kernel can still take time to walk the VMA list, and blocking the executor thread hurts latency for other tasks sharing it. Consider using tokio::fs::read_to_string (which wraps spawn_blocking internally) or tokio::task::spawn_blocking(|| std::fs::read_to_string(...)) to keep the async contract clean.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant