Skip to content

Changes CI test sharding from per-project to per-task granularity.#11847

Draft
AlexeyKuznetsov-DD wants to merge 2 commits into
masterfrom
alexeyk/ci-per-task-test-sharding-2
Draft

Changes CI test sharding from per-project to per-task granularity.#11847
AlexeyKuznetsov-DD wants to merge 2 commits into
masterfrom
alexeyk/ci-per-task-test-sharding-2

Conversation

@AlexeyKuznetsov-DD

Copy link
Copy Markdown
Contributor

What Does This Do

Changes CI test sharding from per-project to per-task granularity.

Previously, the -Pslot=X/Y filter was evaluated once per project via Project.isInSelectedSlot: every Test task
in a module (e.g. jdbc's test / forkedTest / oldH2Test / oldPostgresTest) was pinned to the same slot and
serialized inside one job. Now each test variant is hashed independently on the key "<projectPath>:<taskName>", so
a module's variants spread across different CI slots.

Key changes in CIJobsExtensions.kt:

  • Task.isInSelectedSlot shards at task granularity; Project.isInSelectedSlot is kept for whole-project aggregates
    like runMuzzle.
  • createRootTask now depends directly on the in-slot Test tasks the umbrella would run (via testTaskFilter),
    instead of gating with onlyIf against a project-level slot. Out-of-slot modules aren't pulled into the job at all.
  • Slot parsing is centralized and cached once per build on the root project (SlotSelection / SlotHolder).
  • Modules that collect coverage (-PcheckCoverage or a forceCoverage aggregate -> coverageEnabled) deliberately *
    fall back to whole-project slotting*, and the check/JaCoCo aggregate stays project-level (
    testTaskFilter = null). This keeps per-module JaCoCo execution data complete.

Motivation

The pipeline is gated by the slowest test_inst shard. Serializing all of a module's test variants in one job made that
shard taller than it needed to be. Sharding per task rebalances the work.

A/B comparison of two full pipelines on identical code (only the CI logic differs):

Metric Baseline (per-project) Experiment (per-task) Δ
Wall-clock 3172 s (52.9 min) 2931 s (48.9 min) −241 s (−7.6 %)
Total compute (Σ task durations) 68.45 h 67.08 h −4 929 s (−2.0 %)
Slowest job = test_inst shard (critical path) 2529 s 2161 s −368 s (−14.6 %)
test_inst_latest slowest shard 1752 s 1532 s −220 s (−12.6 %)
test_smoke slowest shard 1392 s 1300 s −92 s

The slowest shard drops ~368 s and the pipeline wall-clock follows almost exactly — ~4 minutes faster (−7.6 %) while
using less total compute.

Additional Notes

⚠️ The old partitioning was silently skipping tests — this fixes it

While A/B testing, the pipeline's own aggregate_test_counts job revealed the old per-project logic was not running
every test
:

Baseline Experiment Δ
Total tests executed 300,614 355,531 +54,917 (+18.3 %)

The gap is entirely in core modules that have multiple test tasks per module:

Job kind Baseline Experiment Δ
test_base 39,655 91,449 +51,794 (+131 %)
test_profiling 0 1,234 +1,234
test_debugger 4,625 6,413 +1,788 (+39 %)
test_inst 168,338 168,437 +99 (~0 %)
test_inst_latest 77,009 77,011 +2 (~0 %)
test_smoke 10,863 10,863 0

Root cause. The old onlyIf gated each task on abs(project.path.hashCode() % totalSlots) + 1 == selectedSlot, but
totalSlots (the hash divisor) did not match the number of splits a job kind actually launched. Confirmed in a baseline
test_profiling job: it ran with CI_NODE_INDEX=3, CI_NODE_TOTAL=13, so the hash produced buckets 1..13 while the
job only ran selected=3 — no profiling module hashed to bucket 3, so 0 profiling tests ran. The baseline pipeline
even emitted its own ⚠️ WARNING: 6 job(s) with zero tests. The same misalignment skipped ~57 % of test_base (
6,610 -> 15,268 tests per JVM, same modules).

Why it's now covered. Per-task sharding hashes on "<projectPath>:<taskName>", and — more importantly —
createRootTask makes each aggregate depend directly on the Test tasks selected for its own slot rather than
relying on an onlyIf compared against a globally-numbered slot that didn't line up. Every test task now lands in
exactly one slot that actually runs.

Why this isn't double-counting. test_inst, test_inst_latest, and test_smoke are unchanged to within 0.06 %.
Those modules have ~one test task each, so they were never affected by the bug; if the new code ran anything twice, they
would inflate too. Only the multi-variant core modules gained tests — the signature of restored coverage, not duplicated
work.

Implemented and tested with Claude.

AlexeyKuznetsov-DD and others added 2 commits July 2, 2026 11:24
Empty commit so this branch runs the current per-project test-slot
sharding through CI as a baseline, before the per-task sharding change
is committed on top for comparison. No files changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rework -Pslot=X/Y sharding so a module's test variants (e.g. jdbc's
test/forkedTest/oldH2Test/oldPostgresTest) hash to independent slots
instead of serializing in a single job. Parse the slot selection once
and cache it on the root project; keep Project.isInSelectedSlot (used by
runMuzzle) at project granularity and add a task-level gate for Test
tasks. The *Check aggregate and all coverage builds stay whole-module,
project-slotted so per-module JaCoCo sees complete execution data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AlexeyKuznetsov-DD AlexeyKuznetsov-DD requested a review from bric3 July 2, 2026 17:10
@AlexeyKuznetsov-DD AlexeyKuznetsov-DD self-assigned this Jul 2, 2026
@AlexeyKuznetsov-DD AlexeyKuznetsov-DD added tag: no release notes Changes to exclude from release notes type: refactoring comp: tooling Build & Tooling tag: ai generated Largely based on code generated by an AI or LLM labels Jul 2, 2026
@AlexeyKuznetsov-DD AlexeyKuznetsov-DD changed the title Alexeyk/ci per task test sharding 2 Changes CI test sharding from per-project to per-task granularity. Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: tooling Build & Tooling tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes type: refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant