You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This adds support for Spark's multiply_ym_interval expression, allowing YearMonth interval multiplication to run through Comet instead of falling back to
Spark.
What changes are included in this PR?
Add YearMonthIntervalType serialization to the Comet proto/native type mapping.
Route MakeYMInterval and MultiplyYMInterval through the JVM codegen dispatcher.
Add IntervalYearVector input/output support in the codegen kernel path.
Add YearMonth interval literal support.
Update expression/datatype support docs.
Add SQL coverage for YearMonth interval multiplication with integer, long, float, double, and decimal multipliers.
peterxcli
changed the title
Support multiply_ym_interval with YearMonth interval codegen dispatch
feat: support multiply_ym_interval with YearMonth interval codegen dispatch
Jun 24, 2026
Thanks for this. Using the codegen dispatcher for multiply_ym_interval is the right call, since it runs Spark's own generated code and so matches Spark by construction, including the half-up rounding and the overflow behavior. The multiplier coverage in the SQL test is nice and broad (int, long, float, double, decimal, plus the half-up rounding case and the numeric-on-the-left normalization).
A few things worth looking at:
Rebase on main. This looks like it was branched before #4541 landed, and #4541 already added the YearMonth and DayTime interval type plumbing plus make_ym_interval via codegen dispatch. After a rebase, these pieces here become duplicates and will conflict: types.proto, serde.rs, Utils.scala, serializeDataType, isSupportedDataType, the CometBatchKernelCodegenOutput changes, CometMakeYMInterval and its registration, and the make_ym_interval row in expressions.md. Stripping those would leave just the multiply-specific work: MultiplyYMInterval, the interval input support in CometBatchKernelCodegenInput (which #4541 did not touch and which multiply genuinely needs), the interval literals in literals.scala and planner.rs, and the test. That would make the diff much easier to review. It should also clear the failing preflight, which is currently tripping on installation.md, a file this PR does not touch, which is a symptom of the branch being stale.
Is the supportedDataType change needed?#4541 deliberately kept intervals out of QueryPlanSerde.supportedDataType because codegen dispatch does not consult it (it serializes the return type via serializeDataType). Adding YearMonthIntervalType there is a broader claim: it tells Comet that YM interval columns can flow through any native operator (scan, filter, project, sort, shuffle), not just the dispatch path, and nothing here exercises those paths. Is it actually required for multiply_ym_interval? If it is, a test that pushes a YM interval column through a plain projection or a shuffle would give confidence the FFI round-trip is correct. If it is not, dropping it keeps the scope tight.
Overflow test.MultiplyYMInterval always throws on overflow via multiplyExact / toIntExact / intValueExact, independent of ANSI mode. The dispatched code should raise the same exception, but it is not covered. Could we add an expect_error case near Int.MaxValue to confirm Comet surfaces the same error as Spark rather than falling back or wrapping? A null-interval input case such as make_ym_interval(NULL, m) * 2 would also round out the null coverage, which currently only exercises null multipliers.
One small heads-up: the make_ym_interval row in expressions.md also changes in #4790, so expect a conflict there.
On benchmarks: since this runs in the JVM via dispatch rather than as a native Rust kernel, I do not think the usual native microbenchmark expectation applies here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #4540.
Rationale for this change
This adds support for Spark's
multiply_ym_intervalexpression, allowing YearMonth interval multiplication to run through Comet instead of falling back toSpark.
What changes are included in this PR?
YearMonthIntervalTypeserialization to the Comet proto/native type mapping.MakeYMIntervalandMultiplyYMIntervalthrough the JVM codegen dispatcher.IntervalYearVectorinput/output support in the codegen kernel path.How are these changes tested?
cargo fmt./mvnw -Pspark-3.5 -Pscala-2.12 -DskipTests compilecargo +1.94.0 check -p datafusion-cometcargo +1.94.0 build./mvnw test -Pspark-3.5 -Pscala-2.12 -Dtest=none -Dsuites="org.apache.comet.CometSqlFileTestSuite multiply_ym_interval"