[python] Add multimodal blob store API by JingsongLi · Pull Request #8432 · apache/paimon

JingsongLi · 2026-07-02T09:37:24Z

Summary

Adds an S3-like object facade for PyPaimon multimodal BLOB columns. The API supports object-style writes, reads, listing, deletion, byte ranges, table-column projection, and Paimon-native descriptor/reference storage.

Changes

Add BlobStore with put_object, put_objects, get_object, head_object, list_objects, and delete APIs.
Support raw body writes, PyPaimon Blob inputs, external URI/descriptor inputs, managed blob copies, and blob-descriptor-field reference preservation.
Expose MultimodalTable.blobs() and export blob store result types from pypaimon.multimodal.
Document the blob APIs after the search APIs, including S3-style ranges, column projection, and external S3 credential behavior.
Add multimodal table tests for body writes, descriptor-backed streaming, reference preservation, managed URI copies, overwrite/delete, and column projection.

Testing

python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py
python -m compileall -q paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py
git diff --check

leaves12138

Thanks for adding the blob store facade. I found one blocker around the overwrite contract and one smaller limit edge case.

The blocker is that put_object / put_objects implement overwrite as delete_by_predicate(keys) plus an append in the same commit. When two writers concurrently create the same previously-absent key, both delete phases see no existing row, both commits can succeed, and the table is left with two rows for the same object key. After that, head_object / get_object fail with ValueError: Multiple rows found for blob key ..., so a normal concurrent create can corrupt the object namespace.

I reproduced this on the PR head by preparing two batch write builders for the same absent key, each doing delete_by_predicate(store._keys_predicate(["k"])) and writing a replacement row, then committing both. Both commits succeeded and table.scan().to_list() returned two rows for k; store.get_object("k") then failed with the duplicate-key error. Please add conflict/retry/unique-key protection for same-key concurrent puts (or otherwise make this limitation explicit and recoverable), and add a regression test for the absent-key race.

Smaller issue: list_objects(limit=0) currently returns the first object because the limit check happens after appending, and negative limits behave the same. Please validate limit >= 0 and return an empty list for limit == 0.

XiaoHongbo-Hope · 2026-07-02T09:56:22Z

What a wonderful idea! The S3-like object API makes BLOB columns much easier to use

JingsongLi · 2026-07-02T09:59:53Z

Addressed the review comments in a5f38d8686:

Added post-commit uniqueness verification for blob overwrites. If a stale delete + append race leaves duplicate rows for the same key, put_object / put_objects retry the overwrite by deleting all matching keys and appending the intended rows again.
Added a regression test that prepares a stale absent-key writer, injects it after the first put commit, and verifies the store recovers to a single row.
Validated list_objects(limit) so limit=0 returns an empty list and negative limits raise ValueError.

Re-ran python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py (45 passed), compileall, and git diff --check.

JingsongLi · 2026-07-02T10:08:16Z

Updated in f8e24e907e to use TableUpdate.upsert_by_arrow_with_key(...) as the primary write path for blob puts.

The implementation still keeps the post-commit uniqueness check because upsert_by_key updates all existing duplicate rows but does not collapse duplicate rows into one row. So the normal path is now upsert-by-key, and only if an absent-key race leaves duplicate rows do we run the replace cleanup path to restore the one-object-key/one-row invariant.

Re-ran python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py (45 passed), compileall, and git diff --check.

XiaoHongbo-Hope · 2026-07-02T10:24:25Z

+            end = int(end_text)
+            if end < start:
+                raise ValueError("Range end must be greater than or equal to start.")
+            length = end - start + 1


This boundary check should also clamp explicit bytes=start-end ranges: when total_length >= 0, end must
be at most total_length - 1; otherwise the ranged descriptor may read past this blob boundary into the
next blob.

Fixed in 674e73354e: explicit bytes=start-end ranges now reject start >= total_length and clamp end to total_length - 1 when the blob length is known, so the ranged descriptor cannot read into the next blob. Added coverage for bytes=10-999 returning only the tail of the current blob and for start-past-end rejection. Re-ran python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py (45 passed), compileall, and git diff --check.

JingsongLi · 2026-07-02T10:34:21Z

We should modify this PR based on #8435

JingsongLi · 2026-07-02T12:01:16Z

Rebased this branch on the latest origin/master (2844adf2e8).

Also added a stronger streaming regression test in 6d22a1a793: put_object now uses a fake URI reader whose only successful path is new_input_stream(). The test fails if the input Blob is materialized through to_data() or if the original Blob.new_input_stream() is used directly; it passes only when the descriptor is handed to the blob writer and the writer streams through the URI reader.

The write path remains upsert_by_arrow_with_key(...) for normal blob puts, with the duplicate cleanup fallback only for restoring the one-key/one-row invariant after an absent-key race.

Re-ran python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py (45 passed), compileall, and git diff --check.

JingsongLi · 2026-07-02T12:14:13Z

Fixed the lingering Arrow upsert path in e7edfac. BlobStore now calls TableUpdate.upsert_by_key(...) with GenericRow values and keeps blob fields as Blob/BlobData, so put_object no longer routes through upsert_by_arrow_with_key or Arrow materialization. The duplicate cleanup fallback also writes rows via write_row. The only Arrow write left around this area is in the race regression test, where it intentionally builds a synthetic stale writer to reproduce the old absent-key race.\n\nRe-ran:\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -k 'blob_store' -vv\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -vv\n- python -m compileall -q paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py\n- git diff --check

JingsongLi · 2026-07-02T12:15:42Z

Follow-up fixed in 5c8ed19: removed the remaining Arrow write from the blob race regression test as well. The test now uses GenericRow + BlobData + write_row, and rg '_rows_to_arrow|write_arrow\(|upsert_by_arrow_with_key|PyarrowFieldParser' paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py returns no matches.\n\nRe-ran the same checks:\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -k 'blob_store' -vv\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -vv\n- python -m compileall -q paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py\n- git diff --check

leaves12138

Thanks for the update. The previous limit handling and absent-key duplicate cleanup are much better now, and the focused local checks pass. I still found one remaining race in the result-read path that can surface transient duplicate rows to callers, so I think this needs one more fix before merge.

leaves12138 · 2026-07-02T12:27:19Z

+        return all(counts.get(key, 0) == 1 for key in keys)
+
+    def _put_result(self, key, snapshot_id: Optional[int]) -> PutObjectResult:
+        info = self.head_object(key)


_put_result re-reads the latest table after _overwrite_rows has already returned. There is still a race window between _has_unique_rows(keys) and this head_object call: another concurrent absent-key put can append the same key in that window, so this call can raise ValueError("Multiple rows found..."). I could reproduce this with two concurrent put_object("payloads/1", ...) calls; one future intermittently fails from this line even though the other call later repairs the table. Could we keep the result read inside the same retry/replace loop, or catch duplicate/NoSuchKey here and run the replace retry before returning, so callers do not observe the transient duplicate?

JingsongLi · 2026-07-02T12:31:52Z

Added column update APIs in d6771c3: update_object_columns(key, columns) and update_objects_columns([...]). This maps to the S3 metadata-update use case, but keeps the API Paimon-native: it updates non-key/non-BLOB table columns by row id and does not rewrite or copy blob data. Missing keys raise NoSuchKey; empty columns are rejected; key/blob/partition columns are rejected for updates.\n\nImplementation detail: the batch API reads current target column values and row ids, overlays the requested columns, then performs one row-id column update for the union of updated columns. This preserves unspecified columns and avoids appending missing-key rows.\n\nRe-ran:\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -k 'blob_store' -vv\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -vv\n- python -m compileall -q paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py\n- git diff --check

JingsongLi · 2026-07-02T12:38:40Z

Addressed the remaining result-read race in 79c2a97. put_objects now returns the result from inside _overwrite_rows: after each upsert/replace attempt it reads the put results and verifies every requested key has exactly one row. If that result read sees duplicate or missing rows, it stays in the same retry loop and runs the replace cleanup before returning, so callers no longer observe Multiple rows found from the post-overwrite result read.\n\nAdded test_blob_store_put_object_recovers_result_read_race, which injects a stale concurrent commit just before the first result read and verifies put_object still returns successfully and leaves one final row.\n\nRe-ran:\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -k 'blob_store' -vv\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -vv\n- python -m compileall -q paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py\n- git diff --check

JingsongLi · 2026-07-02T12:49:35Z

Added blob-as-descriptor=true to multimodal table default options in 6cdb3f2. The default-options test now asserts it, and the multimodal API docs list it with the other defaults.\n\nRe-ran:\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -k 'create_table_defaults_data_evolution_options or blob_store' -vv\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -vv\n- python -m compileall -q paimon-python/pypaimon/multimodal/connection.py paimon-python/pypaimon/tests/multimodal_table_test.py\n- git diff --check

leaves12138

Reviewed the latest update under the intended semantics that BlobStore does not enforce object-key uniqueness yet, and table-level/secondary unique key support can be introduced later. The result path no longer re-reads the latest snapshot after put, metadata-only updates are covered, and the focused local checks passed. No further blockers from my side.

leaves12138 requested changes Jul 2, 2026

View reviewed changes

XiaoHongbo-Hope reviewed Jul 2, 2026

View reviewed changes

JingsongLi added 4 commits July 2, 2026 19:59

[python] Add multimodal blob store API

81303f0

[python] Fix blob store overwrite conflicts

9114af4

[python] Use upsert for blob store puts

c9ff3a0

[python] Verify blob put streams descriptor input

6d22a1a

JingsongLi force-pushed the codex/paimon-python-blob-store branch from f8e24e9 to 6d22a1a Compare July 2, 2026 12:01

JingsongLi added 2 commits July 2, 2026 20:02

[python] Clamp blob range end offsets

674e733

[python] Use row-based upsert for blob puts

e7edfac

[python] Remove Arrow write from blob race test

5c8ed19

leaves12138 requested changes Jul 2, 2026

View reviewed changes

[python] Add blob object column updates

d6771c3

[python] Retry blob put result reads

79c2a97

[python] Add blob descriptor default for multimodal tables

6cdb3f2

JingsongLi added 5 commits July 2, 2026 22:40

[python] Drop blob duplicate key recovery

bb0cfb9

[python] Apply last-write-wins in blob batch puts

03fecf5

[python] Simplify blob object result metadata

9451d7c

[python] Add batch blob delete coverage

03e95d0

[docs] Document blob delete APIs

376db14

leaves12138 approved these changes Jul 2, 2026

View reviewed changes

JingsongLi merged commit d7bae9b into apache:master Jul 2, 2026
7 checks passed

Uh oh!

Conversation

JingsongLi commented Jul 2, 2026

Summary

Changes

Testing

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JingsongLi commented Jul 2, 2026

Uh oh!

JingsongLi commented Jul 2, 2026

Uh oh!

XiaoHongbo-Hope Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi commented Jul 2, 2026

Uh oh!

JingsongLi commented Jul 2, 2026

Uh oh!

JingsongLi commented Jul 2, 2026

Uh oh!

JingsongLi commented Jul 2, 2026

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

leaves12138 Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi commented Jul 2, 2026

Uh oh!

JingsongLi commented Jul 2, 2026

Uh oh!

JingsongLi commented Jul 2, 2026

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

XiaoHongbo-Hope commented Jul 2, 2026 •

edited

Loading