Skip to content

[core] Support row-count based bucket calculation for postpone compact.#8434

Open
Stephen0421 wants to merge 1 commit into
apache:masterfrom
Stephen0421:postpone-bucket-support-row-count
Open

[core] Support row-count based bucket calculation for postpone compact.#8434
Stephen0421 wants to merge 1 commit into
apache:masterfrom
Stephen0421:postpone-bucket-support-row-count

Conversation

@Stephen0421

Copy link
Copy Markdown
Contributor

Purpose

Support calculating postpone bucket count by configured target row count per bucket.

A new table option is added:

  • postpone.target-row-num-per-bucket

When compacting postpone bucket files, bucket count is resolved in this order:

  1. Reuse known bucket count if the partition was already compacted before.
  2. If postpone.target-row-num-per-bucket is configured, compute bucket count from active postpone row count.
  3. Fall back to postpone.default-bucket-num.

Changes

  • Added CoreOptions.POSTPONE_TARGET_ROW_NUM_PER_BUCKET.
  • Added shared PostponeUtils.determineBucketNum to keep Flink and Spark behavior consistent.
  • Added row-count based bucket calculation helper and tests.
  • Updated Flink postpone compact action to use the shared bucket decision logic.
  • Updated Spark postpone compact procedure to avoid capturing java.util.Optional in Spark task closures.
  • Improved overflow error message with user guidance.
  • Updated postpone bucket documentation.

Test

  • org.apache.paimon.flink.PostponeBucketTableITCase
  • org.apache.paimon.spark.sql.PostponeBucketTableTest
  • org.apache.paimon.table.PostponeUtilsTest

@Stephen0421 Stephen0421 force-pushed the postpone-bucket-support-row-count branch from 1a8f31d to 40d01b1 Compare July 2, 2026 10:47
@Stephen0421 Stephen0421 force-pushed the postpone-bucket-support-row-count branch from 40d01b1 to c4867fb Compare July 2, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant