Skip to content

[improve][misc] Introduce AckSetUtil for allocation-free ack-set word operations#26128

Open
Shawyeok wants to merge 2 commits into
apache:masterfrom
Shawyeok:improve-bitset-cardinality-utils
Open

[improve][misc] Introduce AckSetUtil for allocation-free ack-set word operations#26128
Shawyeok wants to merge 2 commits into
apache:masterfrom
Shawyeok:improve-bitset-cardinality-utils

Conversation

@Shawyeok

@Shawyeok Shawyeok commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Motivation

Several batch-ack paths allocate a BitSet or BitSetRecyclable instance only to compute a
cardinality or perform an AND/intersect on raw long[] word arrays. These allocations are
unnecessary — the results can be computed directly via Long.bitCount and bitwise operators.

Modifications

New AckSetUtil utility class (pulsar-common):

Method Description
cardinality(long[]) Popcount of a word array — no BitSet alloc
intersect(long[], long[]) Element-wise AND; result length = min(len1, len2)
cardinalityOfIntersection(long[], long[]) Popcount of AND without allocating a result array

Refactored call sites — replaced allocation-heavy patterns with AckSetUtil in:

  • Consumer (broker) — BitSet.valueOf(x).cardinality() and BitSetRecyclable.create().resetWords(x).cardinality() patterns (4 sites)
  • EntryBatchIndexesAcks (broker) — same pattern (1 site)
  • PositionAckSetUtil (managed-ledger) — isAckSetEmpty (1 site)

Refactored PositionAckSetUtil:

  • andAckSet(long[], long[]) now delegates to AckSetUtil.intersect — eliminates two BitSetRecyclable allocs
  • isAckSetOverlap rewritten with bitwise ops: a bit of 0 means acked; overlap exists when any position is acked in both sets, i.e. ~(a[i] | b[i]) != 0 for some word pair — eliminates two BitSetRecyclable allocs plus two full flip passes

Tests: dedicated AckSetUtilTest with full coverage of all three methods (empty arrays, all-zero, all-set, mixed, asymmetric lengths, no/full/partial overlap).

Verifying this change

  • Make sure that the change passes the CI checks.
# Unit tests
./gradlew :pulsar-common:test --tests org.apache.pulsar.common.util.AckSetUtilTest
./gradlew :pulsar-common:test --tests org.apache.pulsar.common.util.collections.BitSetRecyclableRecyclableTest

# Compilation
./gradlew :pulsar-common:compileJava :pulsar-common:compileTestJava \
          :pulsar-broker:compileJava :managed-ledger:compileJava

# Style
./gradlew :pulsar-common:checkstyleMain :pulsar-common:checkstyleTest \
          :pulsar-broker:checkstyleMain :managed-ledger:checkstyleMain

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Shawyeok added 2 commits June 30, 2026 11:08
Avoid creating BitSet or BitSetRecyclable instances when callers only need cardinality for long-array words. Add direct helpers for plain cardinality and AND cardinality, then use them in batch ack counting paths.

Assisted-by: Codex
…rd operations

Motivation: avoid temporary BitSet/BitSetRecyclable allocations when
callers only need cardinality or intersection on raw long-array words.

Modifications:
- Add AckSetUtil utility class in pulsar-common with:
  - cardinality(long[]) — popcount over a word array
  - intersect(long[], long[]) — element-wise AND, length = min of inputs
  - cardinalityOfIntersection(long[], long[]) — popcount of AND without allocating
- Remove the two static helpers that were added to BitSetRecyclable in the
  previous commit and redirect all call sites (Consumer, EntryBatchIndexesAcks,
  PositionAckSetUtil) to AckSetUtil
- Delegate PositionAckSetUtil.andAckSet(long[], long[]) to AckSetUtil.intersect
- Rewrite PositionAckSetUtil.isAckSetOverlap with bitwise ops (~(a|b) != 0),
  eliminating the BitSetRecyclable allocations and flip passes
- Add comprehensive unit tests in AckSetUtilTest

Assisted-by: Claude Sonnet 4.6
@lhotari lhotari changed the title [improve][common] Introduce AckSetUtil for allocation-free ack-set word operations [improve][misc] Introduce AckSetUtil for allocation-free ack-set word operations Jul 1, 2026

@lhotari lhotari left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check these findings of a local Claude Code review:

The cardinality-based refactors in this PR are exactly equivalent to the code they replace. However, two of the rewrites are not strictly equivalent in one respect: handling of trailing all-zero words. The old code trimmed them (via BitSet.valueOf / toLongArray); the new code does not.

1. intersect no longer trims trailing zero words — breaks the "canonical ack-set" invariant the old andAckSet upheld

AckSetUtil.java (intersect) and PositionAckSetUtil.java (andAckSet)

The old andAckSet ended with thisAckSet.toLongArray(), which returns Arrays.copyOf(words, wordsInUse) — trailing all-zero words are dropped. The new intersect returns a min(len1, len2)-length array without trimming. When two multi-word ack sets AND to a zero high word (very possible for batches > 64 indexes, e.g. [w0, 0b01] & [w0, 0b10][.., 0]), the two implementations return arrays of different length/shape:

  • old: [w0] (trimmed)
  • new: [w0, 0] (not trimmed)

This result is stored back into transaction pending-ack state via andAckSet(pos1, pos2)setAckSet(...) (PendingAckHandleImpl) and into AbstractBaseDispatcher. Most downstream consumers are trailing-zero-safe (isAckSetEmpty/cardinality* popcount to 0; BitSet.valueOf re-trims on read/serialize), so persistence is unaffected — but isAckSetOverlap is not safe (see finding 2), so a non-canonical array can flow into it and change the result.

Suggestion: trim trailing zero words in intersect (to match toLongArray()), or explicitly document/enforce that callers only pass and store canonical (trimmed) arrays.

2. isAckSetOverlap rewrite is trailing-zero-sensitive where the old code was not

PositionAckSetUtil.java (isAckSetOverlap)

The old code did BitSetRecyclable.valueOf(x) (which trims trailing zero words) before flip/and, so a trailing all-zero word was ignored. The new loop ~(a[i] | b[i]) != 0 over min(len) treats a shared trailing zero word as "acked in both" → returns true where the old code returned false. Concretely:

isAckSetOverlap(new long[]{-1L, 0L}, new long[]{-1L, 0L})
// old = false, new = true

Within-word high padding is handled identically by old and new (both operate on full 64-bit words), so the only divergence is whole trailing zero words — which is exactly what finding 1 can now feed in. The two changes interact: an intersect result like [w0, 0] compared against another length-≥2 set can now flip an overlap decision (this drives transaction double-ack / conflict detection).

Note the existing PositionAckSetUtilTest compares via BitSetRecyclable.valueOf(...) (which trims), so it cannot catch this — the divergence is untested. Please add targeted tests: a multi-word AND whose high word is zero, and an isAckSetOverlap case with a trailing zero word.

3. AckSetUtil javadoc / style nits

AckSetUtil.java

  • cardinalityOfIntersection: the @return text references firstWords & secondWords, but the params are named set1/set2; and the @param set2 line has a stray extra space / misaligned indentation.
  • Methods are declared public static inside a Lombok @UtilityClass, which already forces static — the explicit modifier is redundant (harmless, but inconsistent with the annotation's intent).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants