Skip to content

[Core][Flink] Add blob-write-null-on-fetch-failure for BLOB descriptor writes.#8412

Open
wwj6591812 wants to merge 1 commit into
apache:masterfrom
wwj6591812:add_blob-write-null-on-fetch-failure_0701
Open

[Core][Flink] Add blob-write-null-on-fetch-failure for BLOB descriptor writes.#8412
wwj6591812 wants to merge 1 commit into
apache:masterfrom
wwj6591812:add_blob-write-null-on-fetch-failure_0701

Conversation

@wwj6591812

@wwj6591812 wwj6591812 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Add blob-write-null-on-fetch-failure for Flink BLOB descriptor writes, mirroring the scope of blob-write-null-on-missing-file.

Background

Paimon supports writing BLOB descriptor columns that reference external resources (e.g. image URLs). During a Flink write, Paimon resolves each descriptor and fetches the referenced bytes before persisting them.

The existing blob-write-null-on-missing-file option (#8219) helps when the resource does not exist (missing local file or HTTP 404): the BLOB column can be written as NULL instead of failing the job. That option is scoped to the Flink write path only.

This is not enough for our production workloads. Image fetch failures are not limited to 404. We regularly see other errors such as:

  • HTTP 4xx/5xx — e.g. 400 (Bad Request), 403, 429, 503 (Service Unavailable)
  • Invalid or malformed URIs, network timeouts, and other fetch/protocol errors

With both options disabled, a single bad image URL can fail the entire Flink write. In our use case, failed images are acceptable to drop: we want to write NULL for that column and continue the batch, rather than fail-fast.

This PR introduces blob-write-null-on-fetch-failure, with the same scope as blob-write-null-on-missing-file (Flink only). The two options are complementary:

Option Typical cases Behavior when enabled
blob-write-null-on-missing-file Missing file, HTTP 404 Write NULL
blob-write-null-on-fetch-failure 400 / 503 / other non-404 HTTP errors, invalid URI, network/protocol errors, etc. Write NULL

404 remains handled by blob-write-null-on-missing-file only. blob-write-null-on-fetch-failure does not treat 404 as a fetch failure, so the semantics stay clear. The options can be enabled independently or together to cover “missing resource + other fetch failures we can ignore.”

Follow-up: After this PR is merged, we plan to submit a separate metrics PR to expose blob fetch success/failure counters (including HTTP status breakdown) on the Flink writer path, so operators can monitor how often NULLs are written due to fetch issues.

What changes

  • Add blob-write-null-on-fetch-failure to CoreOptions (default false, Flink writes only).
  • Extend HttpClientUtils with shared helpers: isNotFoundError, getHttpStatusCode, isInvalidUriException.
  • Wire the option through FlinkSinkBuilderFlinkRowWrapper (defer non-404 exists-check failures to the writer fetch path when enabled).
  • Apply NULL-on-fetch-failure in the descriptor write path (BlobFileContextMultipleBlobFileWriterBlobFormatWriter) when opening/fetching the blob fails and the error is not 404.
  • No metrics in this PR.

Example

CREATE TABLE t (
  id INT,
  picture BYTES
) WITH (
  'blob-field' = 'picture',
  'blob-as-descriptor' = 'true',
  'blob-write-null-on-fetch-failure' = 'true'
);

Test plan

  • HttpClientUtilsTest — error classification helpers (isNotFoundError, getHttpStatusCode, isInvalidUriException)
  • BlobFormatWriterTest — NULL on non-404 fetch failure; 404 still uses blob-write-null-on-missing-file; fail-fast when option disabled
  • FlinkRowWrapperTest — defer non-404 exists-check failures to the writer fetch path when option enabled
  • BlobTableITCase — Flink SQL E2E:
    • invalid URI → NULL with blob-write-null-on-fetch-failure
    • HTTP 400 / 429 → NULL with blob-write-null-on-fetch-failure
    • combined with blob-write-null-on-missing-file for 404 vs non-404 coverage
mvn test -pl paimon-api,paimon-format,paimon-core,paimon-flink/paimon-flink-common -am \
  -Dtest=HttpClientUtilsTest,BlobFormatWriterTest,FlinkRowWrapperTest,BlobTableITCase

@wwj6591812 wwj6591812 force-pushed the add_blob-write-null-on-fetch-failure_0701 branch from 511c5e6 to f9f29f6 Compare July 1, 2026 11:38
@wwj6591812

Copy link
Copy Markdown
Contributor Author

@JingsongLi
Hi,Please CC, Thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant