Skip to content

[python] Early partition filter in manifest reading#8429

Open
XiaoHongbo-Hope wants to merge 1 commit into
apache:masterfrom
XiaoHongbo-Hope:manifest_early_partition_filter
Open

[python] Early partition filter in manifest reading#8429
XiaoHongbo-Hope wants to merge 1 commit into
apache:masterfrom
XiaoHongbo-Hope:manifest_early_partition_filter

Conversation

@XiaoHongbo-Hope

Copy link
Copy Markdown
Contributor

Purpose

Tests

@XiaoHongbo-Hope XiaoHongbo-Hope changed the title Manifest early partition filter [python] Skip non-matching partition manifest entries before DataFileMeta construction Jul 2, 2026
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] Skip non-matching partition manifest entries before DataFileMeta construction [python] Early partition filter in manifest reading Jul 2, 2026
@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the manifest_early_partition_filter branch from 69c176d to c84b7bd Compare July 2, 2026 09:08
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review July 2, 2026 09:31
@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the manifest_early_partition_filter branch 2 times, most recently from e47e38f to 68fbc07 Compare July 2, 2026 09:49
Deserialize and test the partition predicate before building the _FILE
block, skipping full deserialization for non-matching manifest entries
(aligns with Java's createEntryRowFilter). Add is_in partition test over a
many-partition table.
@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the manifest_early_partition_filter branch from 68fbc07 to 8e4b7f0 Compare July 2, 2026 10:00

@JingsongLi JingsongLi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_partition_filter_test_handles_isnull only directly tests the Predicate on full table rows and does not cover the newly added early manifest path: partition_filter.test(partition). If issues arise with the index space of partition-only rows or null deserialization, this test will still pass. It is recommended to modify or supplement with an actual scan: write pt=None and pt='x', and use with_filter(pb.is_null('pt')).new_scan().plan() to assert that only null partitions are retained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants