Skip to content

[python] Add multimodal blob store API#8432

Merged
JingsongLi merged 15 commits into
apache:masterfrom
JingsongLi:codex/paimon-python-blob-store
Jul 2, 2026
Merged

[python] Add multimodal blob store API#8432
JingsongLi merged 15 commits into
apache:masterfrom
JingsongLi:codex/paimon-python-blob-store

Conversation

@JingsongLi

Copy link
Copy Markdown
Contributor

Summary

Adds an S3-like object facade for PyPaimon multimodal BLOB columns. The API supports object-style writes, reads, listing, deletion, byte ranges, table-column projection, and Paimon-native descriptor/reference storage.

Changes

  • Add BlobStore with put_object, put_objects, get_object, head_object, list_objects, and delete APIs.
  • Support raw body writes, PyPaimon Blob inputs, external URI/descriptor inputs, managed blob copies, and blob-descriptor-field reference preservation.
  • Expose MultimodalTable.blobs() and export blob store result types from pypaimon.multimodal.
  • Document the blob APIs after the search APIs, including S3-style ranges, column projection, and external S3 credential behavior.
  • Add multimodal table tests for body writes, descriptor-backed streaming, reference preservation, managed URI copies, overwrite/delete, and column projection.

Testing

  • python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py
  • python -m compileall -q paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py
  • git diff --check

@leaves12138 leaves12138 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the blob store facade. I found one blocker around the overwrite contract and one smaller limit edge case.

The blocker is that put_object / put_objects implement overwrite as delete_by_predicate(keys) plus an append in the same commit. When two writers concurrently create the same previously-absent key, both delete phases see no existing row, both commits can succeed, and the table is left with two rows for the same object key. After that, head_object / get_object fail with ValueError: Multiple rows found for blob key ..., so a normal concurrent create can corrupt the object namespace.

I reproduced this on the PR head by preparing two batch write builders for the same absent key, each doing delete_by_predicate(store._keys_predicate(["k"])) and writing a replacement row, then committing both. Both commits succeeded and table.scan().to_list() returned two rows for k; store.get_object("k") then failed with the duplicate-key error. Please add conflict/retry/unique-key protection for same-key concurrent puts (or otherwise make this limitation explicit and recoverable), and add a regression test for the absent-key race.

Smaller issue: list_objects(limit=0) currently returns the first object because the limit check happens after appending, and negative limits behave the same. Please validate limit >= 0 and return an empty list for limit == 0.

@XiaoHongbo-Hope

XiaoHongbo-Hope commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What a wonderful idea! The S3-like object API makes BLOB columns much easier to use

@JingsongLi

Copy link
Copy Markdown
Contributor Author

Addressed the review comments in a5f38d8686:

  • Added post-commit uniqueness verification for blob overwrites. If a stale delete + append race leaves duplicate rows for the same key, put_object / put_objects retry the overwrite by deleting all matching keys and appending the intended rows again.
  • Added a regression test that prepares a stale absent-key writer, injects it after the first put commit, and verifies the store recovers to a single row.
  • Validated list_objects(limit) so limit=0 returns an empty list and negative limits raise ValueError.

Re-ran python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py (45 passed), compileall, and git diff --check.

@JingsongLi

Copy link
Copy Markdown
Contributor Author

Updated in f8e24e907e to use TableUpdate.upsert_by_arrow_with_key(...) as the primary write path for blob puts.

The implementation still keeps the post-commit uniqueness check because upsert_by_key updates all existing duplicate rows but does not collapse duplicate rows into one row. So the normal path is now upsert-by-key, and only if an absent-key race leaves duplicate rows do we run the replace cleanup path to restore the one-object-key/one-row invariant.

Re-ran python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py (45 passed), compileall, and git diff --check.

end = int(end_text)
if end < start:
raise ValueError("Range end must be greater than or equal to start.")
length = end - start + 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This boundary check should also clamp explicit bytes=start-end ranges: when total_length >= 0, end must
be at most total_length - 1; otherwise the ranged descriptor may read past this blob boundary into the
next blob.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 674e73354e: explicit bytes=start-end ranges now reject start >= total_length and clamp end to total_length - 1 when the blob length is known, so the ranged descriptor cannot read into the next blob. Added coverage for bytes=10-999 returning only the tail of the current blob and for start-past-end rejection. Re-ran python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py (45 passed), compileall, and git diff --check.

@JingsongLi

Copy link
Copy Markdown
Contributor Author

We should modify this PR based on #8435

@JingsongLi JingsongLi force-pushed the codex/paimon-python-blob-store branch from f8e24e9 to 6d22a1a Compare July 2, 2026 12:01
@JingsongLi

Copy link
Copy Markdown
Contributor Author

Rebased this branch on the latest origin/master (2844adf2e8).

Also added a stronger streaming regression test in 6d22a1a793: put_object now uses a fake URI reader whose only successful path is new_input_stream(). The test fails if the input Blob is materialized through to_data() or if the original Blob.new_input_stream() is used directly; it passes only when the descriptor is handed to the blob writer and the writer streams through the URI reader.

The write path remains upsert_by_arrow_with_key(...) for normal blob puts, with the duplicate cleanup fallback only for restoring the one-key/one-row invariant after an absent-key race.

Re-ran python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py (45 passed), compileall, and git diff --check.

@JingsongLi

Copy link
Copy Markdown
Contributor Author

Fixed the lingering Arrow upsert path in e7edfac. BlobStore now calls TableUpdate.upsert_by_key(...) with GenericRow values and keeps blob fields as Blob/BlobData, so put_object no longer routes through upsert_by_arrow_with_key or Arrow materialization. The duplicate cleanup fallback also writes rows via write_row. The only Arrow write left around this area is in the race regression test, where it intentionally builds a synthetic stale writer to reproduce the old absent-key race.\n\nRe-ran:\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -k 'blob_store' -vv\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -vv\n- python -m compileall -q paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py\n- git diff --check

@JingsongLi

Copy link
Copy Markdown
Contributor Author

Follow-up fixed in 5c8ed19: removed the remaining Arrow write from the blob race regression test as well. The test now uses GenericRow + BlobData + write_row, and rg '_rows_to_arrow|write_arrow\(|upsert_by_arrow_with_key|PyarrowFieldParser' paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py returns no matches.\n\nRe-ran the same checks:\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -k 'blob_store' -vv\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -vv\n- python -m compileall -q paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py\n- git diff --check

@leaves12138 leaves12138 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. The previous limit handling and absent-key duplicate cleanup are much better now, and the focused local checks pass. I still found one remaining race in the result-read path that can surface transient duplicate rows to callers, so I think this needs one more fix before merge.

return all(counts.get(key, 0) == 1 for key in keys)

def _put_result(self, key, snapshot_id: Optional[int]) -> PutObjectResult:
info = self.head_object(key)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_put_result re-reads the latest table after _overwrite_rows has already returned. There is still a race window between _has_unique_rows(keys) and this head_object call: another concurrent absent-key put can append the same key in that window, so this call can raise ValueError("Multiple rows found..."). I could reproduce this with two concurrent put_object("payloads/1", ...) calls; one future intermittently fails from this line even though the other call later repairs the table. Could we keep the result read inside the same retry/replace loop, or catch duplicate/NoSuchKey here and run the replace retry before returning, so callers do not observe the transient duplicate?

@JingsongLi

Copy link
Copy Markdown
Contributor Author

Added column update APIs in d6771c3: update_object_columns(key, columns) and update_objects_columns([...]). This maps to the S3 metadata-update use case, but keeps the API Paimon-native: it updates non-key/non-BLOB table columns by row id and does not rewrite or copy blob data. Missing keys raise NoSuchKey; empty columns are rejected; key/blob/partition columns are rejected for updates.\n\nImplementation detail: the batch API reads current target column values and row ids, overlays the requested columns, then performs one row-id column update for the union of updated columns. This preserves unspecified columns and avoids appending missing-key rows.\n\nRe-ran:\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -k 'blob_store' -vv\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -vv\n- python -m compileall -q paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py\n- git diff --check

@JingsongLi

Copy link
Copy Markdown
Contributor Author

Addressed the remaining result-read race in 79c2a97. put_objects now returns the result from inside _overwrite_rows: after each upsert/replace attempt it reads the put results and verifies every requested key has exactly one row. If that result read sees duplicate or missing rows, it stays in the same retry loop and runs the replace cleanup before returning, so callers no longer observe Multiple rows found from the post-overwrite result read.\n\nAdded test_blob_store_put_object_recovers_result_read_race, which injects a stale concurrent commit just before the first result read and verifies put_object still returns successfully and leaves one final row.\n\nRe-ran:\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -k 'blob_store' -vv\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -vv\n- python -m compileall -q paimon-python/pypaimon/multimodal/blob_store.py paimon-python/pypaimon/tests/multimodal_table_test.py\n- git diff --check

@JingsongLi

Copy link
Copy Markdown
Contributor Author

Added blob-as-descriptor=true to multimodal table default options in 6cdb3f2. The default-options test now asserts it, and the multimodal API docs list it with the other defaults.\n\nRe-ran:\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -k 'create_table_defaults_data_evolution_options or blob_store' -vv\n- python -m pytest paimon-python/pypaimon/tests/multimodal_table_test.py -vv\n- python -m compileall -q paimon-python/pypaimon/multimodal/connection.py paimon-python/pypaimon/tests/multimodal_table_test.py\n- git diff --check

@leaves12138 leaves12138 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the latest update under the intended semantics that BlobStore does not enforce object-key uniqueness yet, and table-level/secondary unique key support can be introduced later. The result path no longer re-reads the latest snapshot after put, metadata-only updates are covered, and the focused local checks passed. No further blockers from my side.

@JingsongLi JingsongLi merged commit d7bae9b into apache:master Jul 2, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants