[improve][misc] Introduce AckSetUtil for allocation-free ack-set word operations#26128
[improve][misc] Introduce AckSetUtil for allocation-free ack-set word operations#26128Shawyeok wants to merge 2 commits into
Conversation
Avoid creating BitSet or BitSetRecyclable instances when callers only need cardinality for long-array words. Add direct helpers for plain cardinality and AND cardinality, then use them in batch ack counting paths. Assisted-by: Codex
…rd operations Motivation: avoid temporary BitSet/BitSetRecyclable allocations when callers only need cardinality or intersection on raw long-array words. Modifications: - Add AckSetUtil utility class in pulsar-common with: - cardinality(long[]) — popcount over a word array - intersect(long[], long[]) — element-wise AND, length = min of inputs - cardinalityOfIntersection(long[], long[]) — popcount of AND without allocating - Remove the two static helpers that were added to BitSetRecyclable in the previous commit and redirect all call sites (Consumer, EntryBatchIndexesAcks, PositionAckSetUtil) to AckSetUtil - Delegate PositionAckSetUtil.andAckSet(long[], long[]) to AckSetUtil.intersect - Rewrite PositionAckSetUtil.isAckSetOverlap with bitwise ops (~(a|b) != 0), eliminating the BitSetRecyclable allocations and flip passes - Add comprehensive unit tests in AckSetUtilTest Assisted-by: Claude Sonnet 4.6
lhotari
left a comment
There was a problem hiding this comment.
Please check these findings of a local Claude Code review:
The cardinality-based refactors in this PR are exactly equivalent to the code they replace. However, two of the rewrites are not strictly equivalent in one respect: handling of trailing all-zero words. The old code trimmed them (via BitSet.valueOf / toLongArray); the new code does not.
1. intersect no longer trims trailing zero words — breaks the "canonical ack-set" invariant the old andAckSet upheld
AckSetUtil.java (intersect) and PositionAckSetUtil.java (andAckSet)
The old andAckSet ended with thisAckSet.toLongArray(), which returns Arrays.copyOf(words, wordsInUse) — trailing all-zero words are dropped. The new intersect returns a min(len1, len2)-length array without trimming. When two multi-word ack sets AND to a zero high word (very possible for batches > 64 indexes, e.g. [w0, 0b01] & [w0, 0b10] → [.., 0]), the two implementations return arrays of different length/shape:
- old:
[w0](trimmed) - new:
[w0, 0](not trimmed)
This result is stored back into transaction pending-ack state via andAckSet(pos1, pos2) → setAckSet(...) (PendingAckHandleImpl) and into AbstractBaseDispatcher. Most downstream consumers are trailing-zero-safe (isAckSetEmpty/cardinality* popcount to 0; BitSet.valueOf re-trims on read/serialize), so persistence is unaffected — but isAckSetOverlap is not safe (see finding 2), so a non-canonical array can flow into it and change the result.
Suggestion: trim trailing zero words in intersect (to match toLongArray()), or explicitly document/enforce that callers only pass and store canonical (trimmed) arrays.
2. isAckSetOverlap rewrite is trailing-zero-sensitive where the old code was not
PositionAckSetUtil.java (isAckSetOverlap)
The old code did BitSetRecyclable.valueOf(x) (which trims trailing zero words) before flip/and, so a trailing all-zero word was ignored. The new loop ~(a[i] | b[i]) != 0 over min(len) treats a shared trailing zero word as "acked in both" → returns true where the old code returned false. Concretely:
isAckSetOverlap(new long[]{-1L, 0L}, new long[]{-1L, 0L})
// old = false, new = trueWithin-word high padding is handled identically by old and new (both operate on full 64-bit words), so the only divergence is whole trailing zero words — which is exactly what finding 1 can now feed in. The two changes interact: an intersect result like [w0, 0] compared against another length-≥2 set can now flip an overlap decision (this drives transaction double-ack / conflict detection).
Note the existing PositionAckSetUtilTest compares via BitSetRecyclable.valueOf(...) (which trims), so it cannot catch this — the divergence is untested. Please add targeted tests: a multi-word AND whose high word is zero, and an isAckSetOverlap case with a trailing zero word.
3. AckSetUtil javadoc / style nits
AckSetUtil.java
cardinalityOfIntersection: the@returntext referencesfirstWords & secondWords, but the params are namedset1/set2; and the@param set2line has a stray extra space / misaligned indentation.- Methods are declared
public staticinside a Lombok@UtilityClass, which already forcesstatic— the explicit modifier is redundant (harmless, but inconsistent with the annotation's intent).
Motivation
Several batch-ack paths allocate a
BitSetorBitSetRecyclableinstance only to compute acardinality or perform an AND/intersect on raw
long[]word arrays. These allocations areunnecessary — the results can be computed directly via
Long.bitCountand bitwise operators.Modifications
New
AckSetUtilutility class (pulsar-common):cardinality(long[])BitSetallocintersect(long[], long[])min(len1, len2)cardinalityOfIntersection(long[], long[])Refactored call sites — replaced allocation-heavy patterns with
AckSetUtilin:Consumer(broker) —BitSet.valueOf(x).cardinality()andBitSetRecyclable.create().resetWords(x).cardinality()patterns (4 sites)EntryBatchIndexesAcks(broker) — same pattern (1 site)PositionAckSetUtil(managed-ledger) —isAckSetEmpty(1 site)Refactored
PositionAckSetUtil:andAckSet(long[], long[])now delegates toAckSetUtil.intersect— eliminates twoBitSetRecyclableallocsisAckSetOverlaprewritten with bitwise ops: a bit of0means acked; overlap exists when any position is acked in both sets, i.e.~(a[i] | b[i]) != 0for some word pair — eliminates twoBitSetRecyclableallocs plus two fullflippassesTests: dedicated
AckSetUtilTestwith full coverage of all three methods (empty arrays, all-zero, all-set, mixed, asymmetric lengths, no/full/partial overlap).Verifying this change
Does this pull request potentially affect one of the following parts: