GH-50280: [C++] Implement VisitTwoBitRuns and VisitTwoSetBitRuns methods by HuaHuaY · Pull Request #50281 · apache/arrow

HuaHuaY · 2026-06-29T11:58:30Z

Rationale for this change

We already have VisitTwoBitBlocks which can iterate over two bitmaps simultaneously. But there is no mirror code about BitRunReader and SetBitRunReader.

What changes are included in this PR?

Implement VisitTwoBitRuns and VisitTwoSetBitRuns methods and refactor VisitTwoBitBlocks in GroupedPivotAccumulator as examples.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

GitHub Issue: [C++] Implement VisitTwoBitRuns and VisitTwoSetBitRuns #50280

github-actions · 2026-06-29T11:58:56Z

⚠️ GitHub issue #50280 has been automatically assigned in GitHub to PR creator.

pitrou · 2026-06-29T16:14:21Z

We have toy element-wise summation benchmarks in bit_block_counter_benchmark.cc (see BinaryBitBlockBenchmark), can you add one that uses these new primitives?

pitrou · 2026-06-29T16:18:44Z

Also, VisitTwoBitBlocksVoid is used extensively for binary scalar kernels, see src/arrow/compute/kernels/codegen_internal.h.

HuaHuaY · 2026-06-30T06:33:16Z

I have renamed the benchmark file and added some cases use VisitXXXBitRuns. And I added two new methods VisitTwoBitRunsVoid and VisitTwoSetBitRunsVoid.

BTW I refactored the existed method VisitSetBitRunsVoid. I believe compiler will optimize the lambda package cost.

HuaHuaY · 2026-06-30T07:27:35Z

Now benchmarks about new method VisitTwoBitRunsVoid are much slower in my own environment. It seems that llvm doesn't vectorize the sum operation in benchmark function. I will optimize the benchmark and push a new commit soon.

HuaHuaY · 2026-06-30T08:38:49Z

@pitrou How to run benchmark in conbench-apache-arrow? Please help to call the bot.

pitrou · 2026-06-30T09:26:33Z

@ursabot please benchmark lang=C++

rok · 2026-06-30T09:26:40Z

Benchmark runs are scheduled for commit c0d6bf7. Watch https://buildkite.com/apache-arrow and https://conbench.arrow-dev.org for updates. A comment will be posted here when the runs are complete.

conbench-apache-arrow · 2026-06-30T13:41:57Z

Thanks for your patience. Conbench analyzed the 3 benchmarking runs that have been run so far on PR commit c0d6bf7.

There were 5 benchmark results indicating a performance regression:

Pull Request Run on amd64-c6a-4xlarge-linux at 2026-06-30 10:39:17Z
- BenchmarkTemporalRounding (C++) with params=<RoundTemporal, zoned, round_1_month>/524288/0, source=cpp-micro, suite=arrow-compute-scalar-temporal-benchmark
- BM_UnpackUint8 (C++) with params=ScalarUnaligned/1/64, source=cpp-micro, suite=arrow-bpacking-benchmark
and 3 more (see the report linked below)

The full Conbench report has more details.

wgtmac · 2026-06-30T14:21:25Z

Interesting that benchmark was run of behalf of @rok

rok · 2026-06-30T14:28:08Z

@wgtmac That's a current workaround for not having a conbench user. :)

HuaHuaY · 2026-06-30T15:25:06Z

I've compiled the test data into three tables. BinaryBitBlockCounterSum is faster than BinaryVisitTwoSetBitRunsSum when there are many null values, which I think is expected. In this case, merging the two bitmaps results in too many consecutive intervals, making it slower than judging multiple bits at once.

However, I was somewhat surprised that in x86 tests, BinaryBitBlockCounterSum achieved better performance when there were almost no null values. I don't understand why.

There were also some unexpected findings. Currently, VisitTwoSetBitRuns performs better than VisitTwoBitRuns. This is partly due to function implementation, and partly because SetBitRunReader performs better than BitRunReader. I will later adjust VisitTwoBitRuns to use SetBitRunReader (VisitTwoBitRuns can directly wrap VisitTwoSetBitRuns). I haven't yet figured out why SetBitRunReader is so much faster.

Based on these test results, I plan to revert the changes to cpp/src/arrow/compute/kernels/hash_aggregate_pivot.cc. However, I think that the real value of VisitTwoSetBitRuns lies in its ability to handle data in batches rather than row-by-row, allowing the use of vector-processing functions such as ::arrow::bit_util::SetBitsTo or ::arrow::internal::CountSetBits. Since I plan to use it in a future PR, I'd like to add benchmark scenarios that will showcase the advantages of VisitTwoSetBitRuns.

4ab7d672 / amd64-c6a-4xlarge-linux / `c0d6bf7`

Benchmark	2	8	64	512	4096	32768	65536
`BitBlockCounterSum`	2.59e+8	6.46e+8	1.39e+9	3.58e+9	4.90e+9	5.15e+9	5.17e+9
`VisitBitRunsSum`	3.10e+8	5.70e+8	2.07e+9	4.30e+9	4.68e+9	4.73e+9	4.73e+9
`VisitSetBitRunsSum`	3.59e+8	6.40e+8	2.20e+9	4.58e+9	4.88e+9	4.93e+9	4.94e+9
`BinaryBitBlockCounterSum`	1.79e+8	3.68e+8	7.40e+8	1.99e+9	3.69e+9	4.18e+9	4.22e+9
`BinaryVisitTwoBitRunsSum`	1.40e+8	2.42e+8	9.38e+8	2.61e+9	3.36e+9	3.49e+9	3.52e+9
`BinaryVisitTwoSetBitRunsSum`	1.78e+8	2.77e+8	1.03e+9	3.05e+9	3.94e+9	4.06e+9	4.08e+9

fd5a984c / amd64-m5-4xlarge-linux / `c0d6bf7`

Benchmark	2	8	64	512	4096	32768	65536
`BitBlockCounterSum`	2.00e+8	4.55e+8	9.33e+8	1.88e+9	2.26e+9	2.32e+9	2.32e+9
`VisitBitRunsSum`	2.07e+8	3.73e+8	1.15e+9	1.98e+9	2.17e+9	2.19e+9	2.19e+9
`VisitSetBitRunsSum`	2.34e+8	4.01e+8	1.15e+9	2.03e+9	2.24e+9	2.26e+9	2.26e+9
`BinaryBitBlockCounterSum`	1.20e+8	2.13e+8	4.06e+8	1.02e+9	1.79e+9	2.00e+9	2.01e+9
`BinaryVisitTwoBitRunsSum`	1.05e+8	1.77e+8	5.93e+8	1.34e+9	1.63e+9	1.67e+9	1.68e+9
`BinaryVisitTwoSetBitRunsSum`	1.16e+8	1.83e+8	6.26e+8	1.51e+9	1.87e+9	1.91e+9	1.91e+9

457fe991 / test-mac-arm / `c0d6bf7`

Benchmark	2	8	64	512	4096	32768	65536
`BitBlockCounterSum`	2.60e+8	7.22e+8	1.98e+9	5.40e+9	7.09e+9	7.29e+9	7.30e+9
`VisitBitRunsSum`	3.41e+8	6.44e+8	1.57e+9	2.33e+9	2.57e+9	2.59e+9	2.59e+9
`VisitSetBitRunsSum`	4.00e+8	6.36e+8	2.64e+9	9.07e+9	1.22e+10	1.29e+10	1.29e+10
`BinaryBitBlockCounterSum`	1.79e+8	4.07e+8	1.00e+9	2.52e+9	4.76e+9	5.38e+9	5.44e+9
`BinaryVisitTwoBitRunsSum`	1.75e+8	2.99e+8	1.12e+9	3.91e+9	5.64e+9	5.90e+9	5.93e+9
`BinaryVisitTwoSetBitRunsSum`	2.28e+8	3.17e+8	1.21e+9	4.14e+9	6.09e+9	6.40e+9	6.42e+9

pitrou · 2026-06-30T15:41:42Z

I haven't yet figured out why SetBitRunReader is so much faster.

Because it's simpler :)

HuaHuaY · 2026-07-01T03:09:24Z

With the help of AI, I investigated why VisitBitRuns is slower than VisitSetBitRuns by modifying benchmarks and examining the generated assembly code.

I concluded that the primary reason lies in the absence of the NOINLINE attribute on BitRunReader::NextRun; this results in a larger code size for VisitBitRuns compared to VisitSetBitRuns, making VisitBitRuns itself less likely to be inlined. However, adding the NOINLINE attribute to BitRunReader::NextRun actually degrades performance further, as it forces a non-inlined call to NextRun regardless of whether the bitmap value is true or false.

Additionally, BitRunReader contains some minor code duplication that can be optimized. I will upload the changes shortly.

pitrou · 2026-07-01T07:53:09Z

I think NOINLINE was added at some point because it actually made some benchmarks better (on some compiler :-)). So let's be careful and check for regressions if we undo it.

Note that, ultimately, what matters is actual users of this function (for example in the compute library), not the bitmap traversal micro-benchmarks :-)

HuaHuaY · 2026-07-01T08:25:29Z

Yeah, I was trying to figure out why BitRunReader's performance was inferior to SetBitRunReader, and removing NOINLINE from SetBitRunReader::NextRun wasn't what I needed.

I submitted a new commit that made some optimizations to BitRunReader:

The assignment word_ = 0 only needs to occur when reading partial word.
Remove redundant calculations of bit_util::IsMultipleOf64 and use word_ == 0 instead.
Prioritize checking position_ + 128 <= length_ to avoid performing two boolean checks in the AdvanceUntilChange and LoadWord functions when null values are rare.

what matters is actual users of this function (for example in the compute library), not the bitmap traversal micro-benchmarks :-)

As you said, user performance is more important than benchmark performance. These changes not only improved my local benchmark but also, I believe, semantically optimized the execution process. And I don't add ARROW_FORCE_INLINE to VisitBitRuns because I concern that ARROW_FORCE_INLINE might slow down user performance in certain scenarios.

pitrou · 2026-07-01T09:13:24Z

@ursabot please benchmark lang=C++

rok · 2026-07-01T09:13:30Z

Benchmark runs are scheduled for commit 3b5b2d0. Watch https://buildkite.com/apache-arrow and https://conbench.arrow-dev.org for updates. A comment will be posted here when the runs are complete.

HuaHuaY · 2026-07-01T10:00:25Z

I added a new benchmark to count the number of valid int8 values in a random struct<a: struct<b: int8>> array. Due to the use of CountSetBits batch interface, VisitTwoBitRunsVoid achieves better performance than VisitTwoBitBlocksVoid in most cases in my own environment.

parent_null_every	value_null_every	BitBlocks ns	BitRuns ns	vs block	SetBitRuns ns	vs block
2	2	5100500	3247773	0.64x	3272429	0.64x
2	8	4592000	3371095	0.73x	3245095	0.71x
2	64	4338500	3281286	0.76x	3272810	0.75x
8	2	4439125	3052261	0.69x	3103000	0.70x
8	8	2801040	3075136	1.10x	3090304	1.10x
8	64	2326967	3070955	1.32x	3171870	1.36x
64	2	4132647	813513	0.20x	848000	0.21x
64	8	2026114	835193	0.41x	823469	0.41x
64	64	1456122	789952	0.54x	842785	0.58x
512	2	3918444	100072	0.03x	98034	0.03x
4096	2	3878944	26525	0.01x	26539	0.01x
65536	65536	1052229	14449	0.01x	14641	0.01x

conbench-apache-arrow · 2026-07-01T12:23:22Z

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 3b5b2d0.

There were 3 benchmark results indicating a performance regression:

Pull Request Run on amd64-c6a-4xlarge-linux at 2026-07-01 10:25:34Z
- CoalesceScalarBench64 (C++) with params=2, source=cpp-micro, suite=arrow-compute-scalar-if-else-benchmark
- CoalesceScalarBench64 (C++) with params=4, source=cpp-micro, suite=arrow-compute-scalar-if-else-benchmark
and 1 more (see the report linked below)

The full Conbench report has more details.

HuaHuaY · 2026-07-01T12:24:50Z

These are the results from the last benchmark and this benchmark, grouped in pairs, with the latest results following. I think that after optimizing BitRunReader, the performance gap between VisitBitRuns and VisitSetBitRuns has indeed narrowed in most cases. @pitrou Please take a look.

4ab7d672 / amd64-c6a-4xlarge-linux / `c0d6bf7`

Benchmark	2	8	64	512	4096	32768	65536
`BitBlockCounterSum`	2.59e+8	6.46e+8	1.39e+9	3.58e+9	4.90e+9	5.15e+9	5.17e+9
`VisitBitRunsSum`	3.10e+8	5.70e+8	2.07e+9	4.30e+9	4.68e+9	4.73e+9	4.73e+9
`VisitSetBitRunsSum`	3.59e+8	6.40e+8	2.20e+9	4.58e+9	4.88e+9	4.93e+9	4.94e+9

d164bbbe / amd64-c6a-4xlarge-linux / `3b5b2d0`

Benchmark	2	8	64	512	4096	32768	65536
`BitBlockCounterSum`	2.56e+8	6.37e+8	1.37e+9	3.55e+9	4.85e+9	5.09e+9	5.11e+9
`VisitBitRunsSum`	3.13e+8	5.75e+8	2.10e+9	4.38e+9	4.70e+9	4.76e+9	4.76e+9
`VisitSetBitRunsSum`	3.39e+8	6.07e+8	2.17e+9	4.50e+9	4.80e+9	4.86e+9	4.87e+9

fd5a984c / amd64-m5-4xlarge-linux / `c0d6bf7`

Benchmark	2	8	64	512	4096	32768	65536
`BitBlockCounterSum`	2.00e+8	4.55e+8	9.33e+8	1.88e+9	2.26e+9	2.32e+9	2.32e+9
`VisitBitRunsSum`	2.07e+8	3.73e+8	1.15e+9	1.98e+9	2.17e+9	2.19e+9	2.19e+9
`VisitSetBitRunsSum`	2.34e+8	4.01e+8	1.15e+9	2.03e+9	2.24e+9	2.26e+9	2.26e+9

e74c6927 / amd64-m5-4xlarge-linux / `3b5b2d0`

Benchmark	2	8	64	512	4096	32768	65536
`BitBlockCounterSum`	2.00e+8	4.55e+8	9.34e+8	1.88e+9	2.26e+9	2.32e+9	2.32e+9
`VisitBitRunsSum`	2.18e+8	3.84e+8	1.15e+9	1.97e+9	2.15e+9	2.17e+9	2.17e+9
`VisitSetBitRunsSum`	2.30e+8	3.99e+8	1.15e+9	2.03e+9	2.24e+9	2.26e+9	2.26e+9

457fe991 / test-mac-arm / `c0d6bf7`

Benchmark	2	8	64	512	4096	32768	65536
`BitBlockCounterSum`	2.60e+8	7.22e+8	1.98e+9	5.40e+9	7.09e+9	7.29e+9	7.30e+9
`VisitBitRunsSum`	3.41e+8	6.44e+8	1.57e+9	2.33e+9	2.57e+9	2.59e+9	2.59e+9
`VisitSetBitRunsSum`	4.00e+8	6.36e+8	2.64e+9	9.07e+9	1.22e+10	1.29e+10	1.29e+10

6701543e / test-mac-arm / `3b5b2d0`

Benchmark	2	8	64	512	4096	32768	65536
`BitBlockCounterSum`	2.60e+8	7.16e+8	1.90e+9	5.33e+9	7.03e+9	7.27e+9	7.29e+9
`VisitBitRunsSum`	3.58e+8	5.84e+8	2.27e+9	8.53e+9	1.23e+10	1.31e+10	1.31e+10
`VisitSetBitRunsSum`	4.07e+8	6.49e+8	2.59e+9	9.00e+9	1.22e+10	1.28e+10	1.29e+10

pitrou · 2026-07-01T13:36:36Z

The benchmarks above don't seem to show significant wins from the BitRunReader improvements. Actually, there is a number of small-ish regressions.

HuaHuaY · 2026-07-01T17:06:44Z

@ursabot please benchmark lang=C++

rok · 2026-07-01T17:06:50Z

Benchmark runs are scheduled for commit b026a7d. Watch https://buildkite.com/apache-arrow and https://conbench.arrow-dev.org for updates. A comment will be posted here when the runs are complete.

HuaHuaY · 2026-07-01T17:12:51Z

I've only modified BitRunReader and not modified SetBitRunReader. And I hope to narrow the performance gap between BitRunReader and SetBitRunReader. Therefore, I think the change in the performance ratio of VisitBitRunsSum and VisitSetBitRunsSum is more indicative of the effectiveness of the modifications. I've reorganized the data and listed the ratio differences in the last column.

However, even if the bitmap benchmark results improve, it doesn't mean it won't negatively impact other benchmark cases. I looked at the three test cases in the benchmark report above. ArrayArrayKernel seems unrelated to the path I modified. I couldn't reproduce the remaining two performance regressions in my own x86 environment. I rerun the benchmark, and if it still shows a performance regression, I plan to revert the BitRunReader changes. And optimizing BitRunReader is not the original purpose of this PR.

Group 1

A: 4ab7d672 / amd64-c6a-4xlarge-linux / c0d6bf7
B: d164bbbe / amd64-c6a-4xlarge-linux / 3b5b2d0
Ratio: VisitBitRunsSum / VisitSetBitRunsSum

params	A `VisitBitRunsSum`	A `VisitSetBitRunsSum`	A ratio	B `VisitBitRunsSum`	B `VisitSetBitRunsSum`	B ratio	Δ ratio (B-A)
2	3.10e+8	3.59e+8	86.43%	3.13e+8	3.39e+8	92.32%	+5.89%
8	5.70e+8	6.40e+8	89.03%	5.75e+8	6.07e+8	94.71%	+5.68%
64	2.07e+9	2.20e+9	94.15%	2.10e+9	2.17e+9	96.72%	+2.57%
512	4.30e+9	4.58e+9	93.84%	4.38e+9	4.50e+9	97.29%	+3.45%
4096	4.68e+9	4.88e+9	95.82%	4.70e+9	4.80e+9	97.97%	+2.14%
32768	4.73e+9	4.93e+9	95.79%	4.76e+9	4.86e+9	98.09%	+2.30%
65536	4.73e+9	4.94e+9	95.84%	4.76e+9	4.87e+9	97.68%	+1.84%

Group 2

A: fd5a984c / amd64-m5-4xlarge-linux / c0d6bf7
B: e74c6927 / amd64-m5-4xlarge-linux / 3b5b2d0
Ratio: VisitBitRunsSum / VisitSetBitRunsSum

params	A `VisitBitRunsSum`	A `VisitSetBitRunsSum`	A ratio	B `VisitBitRunsSum`	B `VisitSetBitRunsSum`	B ratio	Δ ratio (B-A)
2	2.07e+8	2.34e+8	88.71%	2.18e+8	2.30e+8	94.94%	+6.24%
8	3.73e+8	4.01e+8	92.97%	3.84e+8	3.99e+8	96.08%	+3.11%
64	1.15e+9	1.15e+9	100.13%	1.15e+9	1.15e+9	100.49%	+0.35%
512	1.98e+9	2.03e+9	97.34%	1.97e+9	2.03e+9	97.15%	-0.19%
4096	2.17e+9	2.24e+9	97.06%	2.15e+9	2.24e+9	96.04%	-1.02%
32768	2.19e+9	2.26e+9	96.93%	2.17e+9	2.26e+9	95.94%	-0.99%
65536	2.19e+9	2.26e+9	96.94%	2.17e+9	2.26e+9	95.92%	-1.02%

Group 3

A: 457fe991 / test-mac-arm / c0d6bf7
B: 6701543e / test-mac-arm / 3b5b2d0
Ratio: VisitBitRunsSum / VisitSetBitRunsSum

params	A `VisitBitRunsSum`	A `VisitSetBitRunsSum`	A ratio	B `VisitBitRunsSum`	B `VisitSetBitRunsSum`	B ratio	Δ ratio (B-A)
2	3.41e+8	4.00e+8	85.26%	3.58e+8	4.07e+8	88.12%	+2.86%
8	6.44e+8	6.36e+8	101.30%	5.84e+8	6.49e+8	89.95%	-11.35%
64	1.57e+9	2.64e+9	59.46%	2.27e+9	2.59e+9	87.57%	+28.10%
512	2.33e+9	9.07e+9	25.71%	8.53e+9	9.00e+9	94.73%	+69.02%
4096	2.57e+9	1.22e+10	21.10%	1.23e+10	1.22e+10	100.99%	+79.88%
32768	2.59e+9	1.29e+10	20.14%	1.31e+10	1.28e+10	101.80%	+81.65%
65536	2.59e+9	1.29e+10	20.12%	1.31e+10	1.29e+10	101.76%	+81.64%

conbench-apache-arrow · 2026-07-01T20:20:42Z

Thanks for your patience. Conbench analyzed the 3 benchmarking runs that have been run so far on PR commit b026a7d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

HuaHuaY · 2026-07-02T02:32:37Z

@ursabot please benchmark lang=C++

rok · 2026-07-02T02:32:39Z

Commit b026a7d already has scheduled benchmark runs.

HuaHuaY · 2026-07-02T02:33:39Z

@ursabot please benchmark lang=C++

rok · 2026-07-02T02:33:46Z

Benchmark runs are scheduled for commit 41af9ca. Watch https://buildkite.com/apache-arrow and https://conbench.arrow-dev.org for updates. A comment will be posted here when the runs are complete.

HuaHuaY · 2026-07-02T04:06:17Z

A performance regression has indeed occurred on amd64-c6a-4xlarge-linux. I will revert related changes soon.

HuaHuaY · 2026-07-02T07:11:39Z

Hi @pitrou, I have reverted all code changes on existing paths; the rest are new code and should not affect existing benchmarks. Please take a look again.

conbench-apache-arrow · 2026-07-02T08:08:30Z

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 41af9ca.

There were 18 benchmark results indicating a performance regression:

Pull Request Run on amd64-c6a-4xlarge-linux at 2026-07-02 03:47:53Z
- CoalesceScalarBench64 (C++) with params=2, source=cpp-micro, suite=arrow-compute-scalar-if-else-benchmark
- DoubleColumnMajorMatrixConversionFixture (C++) with params=/ConvertToSparseCSCMatrixInt32, source=cpp-micro, suite=arrow-tensor-conversion-benchmark
and 16 more (see the report linked below)

The full Conbench report has more details.

pitrou

Thanks for the update!

pitrou · 2026-07-02T08:22:54Z

+
+namespace arrow::internal {
+
+struct NestedBitmapTraversalBenchmark {


I don't think this benchmark is very realistic, because if we were implementing something like this we'd be doing it for an arbitrary nesting level (not 3), and we would probably not use the bitmap visitors for it.

(perhaps we'd BitmapAnd the validity bitmaps in fixed-size chunks and call CountSetBits on the latter?)

The use case I provided here isn't quite accurate. I think a more accurate use case would be using the type struct<a: list<int8>>. I encountered this during a recursive type parsing process. When encountering list<XXX>, it combines the accumulated bitmap passed from the outer layer with the bitmap of the list to determine the valid rows of the XXX array. Its use case should be similar to VisitTwoBitBlocks, the key point being that sometimes we don't need BitmapAnd to generate a temporary bitmap. Compared to VisitTwoBitBlocks, it simply changes the data input from row-by-row to a batch interface. I plan to modify this benchmark later. If you have any better suggestions, please let me know.

pitrou · 2026-07-02T08:23:47Z

               thread_pool_test.cc)

-add_arrow_benchmark(bit_block_counter_benchmark)
+add_arrow_benchmark(bitmap_traversal_benchmark)


I understand why you're renaming the benchmark executable, but I'm afraid it's gonna break the benchmark history by making it unrelated in the benchmark database. @rok Am I right?

I believe so. If you have a good reason to and you don't care about breaking history that's ok. If not I'd suggest to keep the current name.

Get it. I will revert the name change.

pitrou · 2026-07-02T08:28:53Z

+  ASSERT_THAT(runs, ElementsAreArray({SetBitRun{0, 2}, SetBitRun{3, 2}}));
+}
+
+TEST_F(TestSetBitRunReader, VisitTwoBitRunsVoid) {


Can the VisitTwoBitRunsVoid tests be folded into the VisitTwoBitRuns tests as they use the exact same data? It will make the whole test file more readable and maintainable.

HuaHuaY requested a review from pitrou as a code owner June 29, 2026 11:58

github-actions Bot added Component: C++ awaiting review Awaiting review labels Jun 29, 2026

HuaHuaY force-pushed the visit_two_bit_runs branch from 36c7481 to 51ee9fa Compare June 30, 2026 06:29

HuaHuaY force-pushed the visit_two_bit_runs branch from b026a7d to 41af9ca Compare July 2, 2026 02:33

Implement VisitTwoBitRuns and VisitTwoSetBitRuns methods

4f4b336

HuaHuaY force-pushed the visit_two_bit_runs branch from 41af9ca to 4f4b336 Compare July 2, 2026 07:10

pitrou reviewed Jul 2, 2026

View reviewed changes

github-actions Bot added awaiting committer review Awaiting committer review awaiting review Awaiting review awaiting changes Awaiting changes and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Jul 2, 2026


		namespace arrow::internal {

		struct NestedBitmapTraversalBenchmark {

Uh oh!

Conversation

HuaHuaY commented Jun 29, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

pitrou commented Jun 29, 2026

Uh oh!

pitrou commented Jun 29, 2026

Uh oh!

HuaHuaY commented Jun 30, 2026

Uh oh!

HuaHuaY commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuaHuaY commented Jun 30, 2026

Uh oh!

pitrou commented Jun 30, 2026

Uh oh!

rok commented Jun 30, 2026

Uh oh!

conbench-apache-arrow Bot commented Jun 30, 2026

Uh oh!

wgtmac commented Jun 30, 2026

Uh oh!

rok commented Jun 30, 2026

Uh oh!

HuaHuaY commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

4ab7d672 / amd64-c6a-4xlarge-linux / c0d6bf7

fd5a984c / amd64-m5-4xlarge-linux / c0d6bf7

457fe991 / test-mac-arm / c0d6bf7

Uh oh!

pitrou commented Jun 30, 2026

Uh oh!

HuaHuaY commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented Jul 1, 2026

Uh oh!

HuaHuaY commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented Jul 1, 2026

Uh oh!

rok commented Jul 1, 2026

Uh oh!

HuaHuaY commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

conbench-apache-arrow Bot commented Jul 1, 2026

Uh oh!

HuaHuaY commented Jul 1, 2026

4ab7d672 / amd64-c6a-4xlarge-linux / c0d6bf7

d164bbbe / amd64-c6a-4xlarge-linux / 3b5b2d0

fd5a984c / amd64-m5-4xlarge-linux / c0d6bf7

e74c6927 / amd64-m5-4xlarge-linux / 3b5b2d0

457fe991 / test-mac-arm / c0d6bf7

6701543e / test-mac-arm / 3b5b2d0

Uh oh!

pitrou commented Jul 1, 2026

Uh oh!

HuaHuaY commented Jul 1, 2026

Uh oh!

rok commented Jul 1, 2026

Uh oh!

HuaHuaY commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Group 1

Group 2

Group 3

Uh oh!

conbench-apache-arrow Bot commented Jul 1, 2026

Uh oh!

HuaHuaY commented Jul 2, 2026

HuaHuaY commented Jun 29, 2026 •

edited by github-actions Bot

Loading

HuaHuaY commented Jun 30, 2026 •

edited

Loading

HuaHuaY commented Jun 30, 2026 •

edited

Loading

4ab7d672 / amd64-c6a-4xlarge-linux / `c0d6bf7`

fd5a984c / amd64-m5-4xlarge-linux / `c0d6bf7`

457fe991 / test-mac-arm / `c0d6bf7`

HuaHuaY commented Jul 1, 2026 •

edited

Loading

HuaHuaY commented Jul 1, 2026 •

edited

Loading

HuaHuaY commented Jul 1, 2026 •

edited

Loading

4ab7d672 / amd64-c6a-4xlarge-linux / `c0d6bf7`

d164bbbe / amd64-c6a-4xlarge-linux / `3b5b2d0`

fd5a984c / amd64-m5-4xlarge-linux / `c0d6bf7`

e74c6927 / amd64-m5-4xlarge-linux / `3b5b2d0`

457fe991 / test-mac-arm / `c0d6bf7`

6701543e / test-mac-arm / `3b5b2d0`

HuaHuaY commented Jul 1, 2026 •

edited

Loading

HuaHuaY commented Jul 2, 2026 •

edited

Loading

HuaHuaY Jul 2, 2026 •

edited

Loading