Skip to content

[python] Add Mosaic row-group stats skipping#8419

Open
QuakeWang wants to merge 2 commits into
apache:masterfrom
QuakeWang:mosaic-rg-skip
Open

[python] Add Mosaic row-group stats skipping#8419
QuakeWang wants to merge 2 commits into
apache:masterfrom
QuakeWang:mosaic-rg-skip

Conversation

@QuakeWang

Copy link
Copy Markdown
Member

Purpose

Python Mosaic reads currently apply pushed predicates only after read_row_group, by filtering the in-memory Arrow batch. This keeps results correct but misses Mosaic row-group statistics pruning.

This PR passes the structured Paimon predicate to the Mosaic format reader and evaluates it against Mosaic row-group statistics before reading a row group. The final Arrow predicate filter is still applied after reading, so stats pruning remains an optimization. If stats are missing, conversion fails, or the file schema is older than the table schema, the reader fails open and keeps the previous behavior.

Tests

  • pytest paimon-python/pypaimon/tests/test_format_mosaic_reader_writer.py -q

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
Comment thread paimon-python/pypaimon/read/reader/format_mosaic_reader.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants