Changes CI test sharding from per-project to per-task granularity. by AlexeyKuznetsov-DD · Pull Request #11847 · DataDog/dd-trace-java

AlexeyKuznetsov-DD · 2026-07-02T17:10:59Z

What Does This Do

Changes CI test sharding from per-project to per-task granularity.

Previously, the -Pslot=X/Y filter was evaluated once per project via Project.isInSelectedSlot: every Test task
in a module (e.g. jdbc's test / forkedTest / oldH2Test / oldPostgresTest) was pinned to the same slot and
serialized inside one job. Now each test variant is hashed independently on the key "<projectPath>:<taskName>", so
a module's variants spread across different CI slots.

Key changes in CIJobsExtensions.kt:

Task.isInSelectedSlot shards at task granularity; Project.isInSelectedSlot is kept for whole-project aggregates
like runMuzzle.
createRootTask now depends directly on the in-slot Test tasks the umbrella would run (via testTaskFilter),
instead of gating with onlyIf against a project-level slot. Out-of-slot modules aren't pulled into the job at all.
Slot parsing is centralized and cached once per build on the root project (SlotSelection / SlotHolder).
Modules that collect coverage (-PcheckCoverage or a forceCoverage aggregate -> coverageEnabled) deliberately *
fall back to whole-project slotting*, and the check/JaCoCo aggregate stays project-level (
testTaskFilter = null). This keeps per-module JaCoCo execution data complete.

Motivation

The pipeline is gated by the slowest test_inst shard. Serializing all of a module's test variants in one job made that
shard taller than it needed to be. Sharding per task rebalances the work.

A/B comparison of two full pipelines on identical code (only the CI logic differs):

Metric	Baseline (per-project)	Experiment (per-task)	Δ
Wall-clock	3172 s (52.9 min)	2931 s (48.9 min)	−241 s (−7.6 %)
Total compute (Σ task durations)	68.45 h	67.08 h	−4 929 s (−2.0 %)
Slowest job = `test_inst` shard (critical path)	2529 s	2161 s	−368 s (−14.6 %)
`test_inst_latest` slowest shard	1752 s	1532 s	−220 s (−12.6 %)
`test_smoke` slowest shard	1392 s	1300 s	−92 s

The slowest shard drops ~368 s and the pipeline wall-clock follows almost exactly — ~4 minutes faster (−7.6 %) while
using less total compute.

Additional Notes

⚠️ The old partitioning was silently skipping tests — this fixes it

While A/B testing, the pipeline's own aggregate_test_counts job revealed the old per-project logic was not running
every test:

	Baseline	Experiment	Δ
Total tests executed	300,614	355,531	+54,917 (+18.3 %)

The gap is entirely in core modules that have multiple test tasks per module:

Job kind	Baseline	Experiment	Δ
`test_base`	39,655	91,449	+51,794 (+131 %)
`test_profiling`	0	1,234	+1,234
`test_debugger`	4,625	6,413	+1,788 (+39 %)
`test_inst`	168,338	168,437	+99 (~0 %)
`test_inst_latest`	77,009	77,011	+2 (~0 %)
`test_smoke`	10,863	10,863	0

Root cause. The old onlyIf gated each task on abs(project.path.hashCode() % totalSlots) + 1 == selectedSlot, but
totalSlots (the hash divisor) did not match the number of splits a job kind actually launched. Confirmed in a baseline
test_profiling job: it ran with CI_NODE_INDEX=3, CI_NODE_TOTAL=13, so the hash produced buckets 1..13 while the
job only ran selected=3 — no profiling module hashed to bucket 3, so 0 profiling tests ran. The baseline pipeline
even emitted its own ⚠️ WARNING: 6 job(s) with zero tests. The same misalignment skipped ~57 % of test_base (
6,610 -> 15,268 tests per JVM, same modules).

Why it's now covered. Per-task sharding hashes on "<projectPath>:<taskName>", and — more importantly —
createRootTask makes each aggregate depend directly on the Test tasks selected for its own slot rather than
relying on an onlyIf compared against a globally-numbered slot that didn't line up. Every test task now lands in
exactly one slot that actually runs.

Why this isn't double-counting. test_inst, test_inst_latest, and test_smoke are unchanged to within 0.06 %.
Those modules have ~one test task each, so they were never affected by the bug; if the new code ran anything twice, they
would inflate too. Only the multi-variant core modules gained tests — the signature of restored coverage, not duplicated
work.

Implemented and tested with Claude.

Empty commit so this branch runs the current per-project test-slot sharding through CI as a baseline, before the per-task sharding change is committed on top for comparison. No files changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Rework -Pslot=X/Y sharding so a module's test variants (e.g. jdbc's test/forkedTest/oldH2Test/oldPostgresTest) hash to independent slots instead of serializing in a single job. Parse the slot selection once and cache it on the root project; keep Project.isInSelectedSlot (used by runMuzzle) at project granularity and add a task-level gate for Test tasks. The *Check aggregate and all coverage builds stay whole-module, project-slotted so per-module JaCoCo sees complete execution data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

AlexeyKuznetsov-DD and others added 2 commits July 2, 2026 11:24

AlexeyKuznetsov-DD requested a review from bric3 July 2, 2026 17:10

AlexeyKuznetsov-DD self-assigned this Jul 2, 2026

AlexeyKuznetsov-DD added tag: no release notes Changes to exclude from release notes type: refactoring comp: tooling Build & Tooling tag: ai generated Largely based on code generated by an AI or LLM labels Jul 2, 2026

AlexeyKuznetsov-DD changed the title ~~Alexeyk/ci per task test sharding 2~~ Changes CI test sharding from per-project to per-task granularity. Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes CI test sharding from per-project to per-task granularity.#11847

Changes CI test sharding from per-project to per-task granularity.#11847
AlexeyKuznetsov-DD wants to merge 2 commits into
masterfrom
alexeyk/ci-per-task-test-sharding-2

AlexeyKuznetsov-DD commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AlexeyKuznetsov-DD commented Jul 2, 2026

What Does This Do

Motivation

Additional Notes

⚠️ The old partitioning was silently skipping tests — this fixes it

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant