Optimize Int8 and Int16 integer IN filters#23299
Conversation
d99ae33 to
5141088
Compare
|
run benchmark in_list_strategy |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing codex/in-list-direct-bitmap-filter (5141088) to 01bf68c (merge-base) diff using: in_list_strategy File an issue against this benchmark runner |
|
|
||
| #[inline(always)] | ||
| fn index(value: Self::Native) -> usize { | ||
| value as u16 as usize |
There was a problem hiding this comment.
the idea is just to reinterpret the signed value into a usize
There was a problem hiding this comment.
I think this is good as an comment within the code
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagein_list_strategy — base (merge-base)
in_list_strategy — branch
File an issue against this benchmark runner |
| } | ||
|
|
||
| #[test] | ||
| fn bitmap_filter_i8_handles_signed_boundaries_and_slices() -> Result<()> { |
There was a problem hiding this comment.
One small follow-up idea: it could be useful to add a public-path regression through InListExpr::try_new_from_array, or a SQL-level test, for Int8 and Int16 signed values.
That would cover the dispatch path in strategy.rs in addition to the filter implementation itself.
There was a problem hiding this comment.
This is a good idea -- I reviewed the existing slt (SQL) test coverage and it doesn't really have comprensive IN list coverage -- I will add some in a follow on PR
There was a problem hiding this comment.
Tracking with an issue:
First PR
|
Benchmarks look good |
|
Per @geoffreyclaude in I will address @kosiew 's comments and merge this PR so we can keep thigns moving |
|
Ok, I think all the comments have been addressed, and we are working on SLT coverage in #23307 🚀 |
Which issue does this PR close?
Rationale for this change
This is an alternate implementation proposal for optimizing small integer
INlist from @geoffreyclaudeThe in #23013 does some shenangans, including rebuilding the Arrow array data directly in order to probe a bitmap keyed by an unsigned type. Handling signed small integer types directly keeps the implementation simpler and avoids the extra array-data path while preserving the compact bitmap representation for these small domains.
What changes are included in this PR?
This updates the specialized
INlist bitmap filter to handleInt8,UInt8,Int16, andUInt16directly.Signed small integer filters now map values to bitmap indexes by their bit pattern, so negative values use the same compact bitmap domain as their unsigned counterparts.
Are these changes tested?
Yes. This adds coverage for signed
Int8andInt16bitmap filters, including boundary values and sliced arrays.Are there any user-facing changes?
No. This is an internal physical expression optimization for small integer
INlist filters.