[Core][Flink] Add blob-write-null-on-fetch-failure for BLOB descriptor writes.#8412
Open
wwj6591812 wants to merge 1 commit into
Open
[Core][Flink] Add blob-write-null-on-fetch-failure for BLOB descriptor writes.#8412wwj6591812 wants to merge 1 commit into
wwj6591812 wants to merge 1 commit into
Conversation
511c5e6 to
f9f29f6
Compare
Contributor
Author
|
@JingsongLi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add
blob-write-null-on-fetch-failurefor Flink BLOB descriptor writes, mirroring the scope ofblob-write-null-on-missing-file.Background
Paimon supports writing BLOB descriptor columns that reference external resources (e.g. image URLs). During a Flink write, Paimon resolves each descriptor and fetches the referenced bytes before persisting them.
The existing
blob-write-null-on-missing-fileoption (#8219) helps when the resource does not exist (missing local file or HTTP 404): the BLOB column can be written as NULL instead of failing the job. That option is scoped to the Flink write path only.This is not enough for our production workloads. Image fetch failures are not limited to 404. We regularly see other errors such as:
With both options disabled, a single bad image URL can fail the entire Flink write. In our use case, failed images are acceptable to drop: we want to write NULL for that column and continue the batch, rather than fail-fast.
This PR introduces
blob-write-null-on-fetch-failure, with the same scope asblob-write-null-on-missing-file(Flink only). The two options are complementary:blob-write-null-on-missing-fileblob-write-null-on-fetch-failure404 remains handled by
blob-write-null-on-missing-fileonly.blob-write-null-on-fetch-failuredoes not treat 404 as a fetch failure, so the semantics stay clear. The options can be enabled independently or together to cover “missing resource + other fetch failures we can ignore.”Follow-up: After this PR is merged, we plan to submit a separate metrics PR to expose blob fetch success/failure counters (including HTTP status breakdown) on the Flink writer path, so operators can monitor how often NULLs are written due to fetch issues.
What changes
blob-write-null-on-fetch-failuretoCoreOptions(defaultfalse, Flink writes only).HttpClientUtilswith shared helpers:isNotFoundError,getHttpStatusCode,isInvalidUriException.FlinkSinkBuilder→FlinkRowWrapper(defer non-404 exists-check failures to the writer fetch path when enabled).BlobFileContext→MultipleBlobFileWriter→BlobFormatWriter) when opening/fetching the blob fails and the error is not 404.Example
Test plan
HttpClientUtilsTest— error classification helpers (isNotFoundError,getHttpStatusCode,isInvalidUriException)BlobFormatWriterTest— NULL on non-404 fetch failure; 404 still usesblob-write-null-on-missing-file; fail-fast when option disabledFlinkRowWrapperTest— defer non-404 exists-check failures to the writer fetch path when option enabledBlobTableITCase— Flink SQL E2E:blob-write-null-on-fetch-failureblob-write-null-on-fetch-failureblob-write-null-on-missing-filefor 404 vs non-404 coveragemvn test -pl paimon-api,paimon-format,paimon-core,paimon-flink/paimon-flink-common -am \ -Dtest=HttpClientUtilsTest,BlobFormatWriterTest,FlinkRowWrapperTest,BlobTableITCase