Skip to content

feat: Support duration type in approx_distinct#23291

Open
mkleen wants to merge 1 commit into
apache:mainfrom
mkleen:duration_approx_distinct
Open

feat: Support duration type in approx_distinct#23291
mkleen wants to merge 1 commit into
apache:mainfrom
mkleen:duration_approx_distinct

Conversation

@mkleen

@mkleen mkleen commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

  • Support the Duration type for approx_distinct
  • The Arrow type Duration can be directly supported for NumericHLLAccumulator and HllGroupsAccumulator

What changes are included in this PR?

  • Enable NumericHLLAccumulator and HllGroupsAccumulator to support Duration
  • Tests for the non-grouped and grouped path as part of approx_distinct.rst and aggregate.slt
  • Benchmarks

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, approx_distinct supports now Duration but no breaking changes.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jul 2, 2026
@mkleen mkleen force-pushed the duration_approx_distinct branch 2 times, most recently from 4749a7d to 87049df Compare July 2, 2026 13:35
@mkleen mkleen force-pushed the duration_approx_distinct branch from 87049df to 98a25ce Compare July 2, 2026 13:49
@mkleen mkleen marked this pull request as ready for review July 2, 2026 17:32
@mkleen mkleen changed the title feat: Support Duration type in approx_distinct feat: Support duration type in approx_distinct Jul 3, 2026

// --- Duration benchmarks ---
let values = Arc::new(create_duration_array(n_distinct)) as ArrayRef;
c.bench_function(&format!("approx_distinct duration {pct}% distinct"), |b| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly i feel we dont need to add a benchmark for this; it would be good to cut down on the benchmarks as having too many results can be too noisy, especially as the codepath being tested here would be the same for all primitives anyway 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

#[test]
fn duration_support_numerical_acc_and_group_acc() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could probably omit this test since we have the SLT? or is this to handle the MAX/MIN cases?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sure we could just go with the SLT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants