Skip to content

feat: support multiply_ym_interval with YearMonth interval codegen dispatch#4715

Open
peterxcli wants to merge 3 commits into
apache:mainfrom
peterxcli:feat/make_ym_interval
Open

feat: support multiply_ym_interval with YearMonth interval codegen dispatch#4715
peterxcli wants to merge 3 commits into
apache:mainfrom
peterxcli:feat/make_ym_interval

Conversation

@peterxcli

@peterxcli peterxcli commented Jun 24, 2026

Copy link
Copy Markdown
Member

Which issue does this PR close?

Part of #4540.

Rationale for this change

This adds support for Spark's multiply_ym_interval expression, allowing YearMonth interval multiplication to run through Comet instead of falling back to
Spark.

What changes are included in this PR?

  • Add YearMonthIntervalType serialization to the Comet proto/native type mapping.
  • Route MakeYMInterval and MultiplyYMInterval through the JVM codegen dispatcher.
  • Add IntervalYearVector input/output support in the codegen kernel path.
  • Add YearMonth interval literal support.
  • Update expression/datatype support docs.
  • Add SQL coverage for YearMonth interval multiplication with integer, long, float, double, and decimal multipliers.

How are these changes tested?

  • cargo fmt
  • ./mvnw -Pspark-3.5 -Pscala-2.12 -DskipTests compile
  • cargo +1.94.0 check -p datafusion-comet
  • cargo +1.94.0 build
  • ./mvnw test -Pspark-3.5 -Pscala-2.12 -Dtest=none -Dsuites="org.apache.comet.CometSqlFileTestSuite multiply_ym_interval"

@peterxcli peterxcli changed the title Support multiply_ym_interval with YearMonth interval codegen dispatch feat: support multiply_ym_interval with YearMonth interval codegen dispatch Jun 24, 2026
@andygrove

Copy link
Copy Markdown
Member

Thanks for this. Using the codegen dispatcher for multiply_ym_interval is the right call, since it runs Spark's own generated code and so matches Spark by construction, including the half-up rounding and the overflow behavior. The multiplier coverage in the SQL test is nice and broad (int, long, float, double, decimal, plus the half-up rounding case and the numeric-on-the-left normalization).

A few things worth looking at:

Rebase on main. This looks like it was branched before #4541 landed, and #4541 already added the YearMonth and DayTime interval type plumbing plus make_ym_interval via codegen dispatch. After a rebase, these pieces here become duplicates and will conflict: types.proto, serde.rs, Utils.scala, serializeDataType, isSupportedDataType, the CometBatchKernelCodegenOutput changes, CometMakeYMInterval and its registration, and the make_ym_interval row in expressions.md. Stripping those would leave just the multiply-specific work: MultiplyYMInterval, the interval input support in CometBatchKernelCodegenInput (which #4541 did not touch and which multiply genuinely needs), the interval literals in literals.scala and planner.rs, and the test. That would make the diff much easier to review. It should also clear the failing preflight, which is currently tripping on installation.md, a file this PR does not touch, which is a symptom of the branch being stale.

Is the supportedDataType change needed? #4541 deliberately kept intervals out of QueryPlanSerde.supportedDataType because codegen dispatch does not consult it (it serializes the return type via serializeDataType). Adding YearMonthIntervalType there is a broader claim: it tells Comet that YM interval columns can flow through any native operator (scan, filter, project, sort, shuffle), not just the dispatch path, and nothing here exercises those paths. Is it actually required for multiply_ym_interval? If it is, a test that pushes a YM interval column through a plain projection or a shuffle would give confidence the FFI round-trip is correct. If it is not, dropping it keeps the scope tight.

Overflow test. MultiplyYMInterval always throws on overflow via multiplyExact / toIntExact / intValueExact, independent of ANSI mode. The dispatched code should raise the same exception, but it is not covered. Could we add an expect_error case near Int.MaxValue to confirm Comet surfaces the same error as Spark rather than falling back or wrapping? A null-interval input case such as make_ym_interval(NULL, m) * 2 would also round out the null coverage, which currently only exercises null multipliers.

One small heads-up: the make_ym_interval row in expressions.md also changes in #4790, so expect a conflict there.

On benchmarks: since this runs in the JVM via dispatch rather than as a native Rust kernel, I do not think the usual native microbenchmark expectation applies here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants