Skip to content

[ExecuTorch][WebGPU] Request shader-f16 device feature (enablement, fail-open)#20729

Open
JCNTH wants to merge 4 commits into
gh/JCNTH/3/basefrom
gh/JCNTH/3/head
Open

[ExecuTorch][WebGPU] Request shader-f16 device feature (enablement, fail-open)#20729
JCNTH wants to merge 4 commits into
gh/JCNTH/3/basefrom
gh/JCNTH/3/head

Conversation

@JCNTH

@JCNTH JCNTH commented Jul 5, 2026

Copy link
Copy Markdown

Stack from ghstack (oldest at bottom):

Enablement for a future fp16 compute path: detect and request WGPUFeatureName_ShaderF16 at device creation when the adapter advertises it (fail-open — absence leaves the fp32 path unchanged), and expose WebGPUContext::shader_f16_supported for fp16 kernels to branch on. No fp16 hot-kernel consumer yet.

Key changes:

  • WebGPUDevice.h — add bool shader_f16_supported to WebGPUContext (outside the profiling #ifdef).
  • WebGPUDevice.cpp — hoist required_features out of the profiling #ifdef; push WGPUFeatureName_ShaderF16 when the adapter has it; assign requiredFeatures/requiredFeatureCount only when non-empty; move #include <vector> to the unconditional include block.
    @exported-using-ghexport

Differential Revision: D110660013

Differential Revision: D110660013

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jul 5, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20729

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 21 Pending

As of commit f6ff099 with merge base 4af91c3 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla

linux-foundation-easycla Bot commented Jul 5, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@JCNTH

JCNTH commented Jul 5, 2026

Copy link
Copy Markdown
Author

@claude throoughly review, check for overengineering, correctness, and areaas for modularization

@claude

claude Bot commented Jul 5, 2026

Copy link
Copy Markdown

Claude finished @JCNTH's task in 51s —— View job


Code Review: shader-f16 device feature enablement

  • Read PR diff and both changed files
  • Review correctness, overengineering, and modularization

Overall this is a clean, well-scoped enablement change. The lifetime handling is correct, the fail-open semantics match the existing TimestampQuery path, and the diff is appropriately minimal. A few observations below.

Correctness ✅

  • Vector lifetime is sound. device_desc.requiredFeatures is assigned required_features.data() and the vector stays in scope through wgpuAdapterRequestDevice at WebGPUDevice.cpp:168. The inline comment (WebGPUDevice.cpp:146) documenting this is genuinely useful — exactly the kind of non-obvious context worth a comment.
  • #include <vector> unconditional (WebGPUDevice.cpp:16) is now correct — required_features is used regardless of WGPU_BACKEND_ENABLE_PROFILING. Previously guarding it under the profiling #ifdef would have broken the non-profiling build once the vector was hoisted, so this move is necessary, not incidental.
  • Non-empty guard (WebGPUDevice.cpp:159) is fine. Setting requiredFeatureCount = 0 with a null/empty data() would also be valid per the WebGPU descriptor contract, so the guard is mildly redundant — but it reads clearly and avoids handing a possibly-null data() pointer to the descriptor, so I'd keep it.

One semantic nuance worth flagging ⚠️

The comment says "fail-open (absence just disables the corresponding fast path)", but ShaderF16 is pushed into requiredFeatures — a required device feature. It's only fail-open with respect to the adapter not advertising the feature (the wgpuAdapterHasFeature gate). If an adapter advertises ShaderF16 but device creation then rejects it, the whole context creation throws at WebGPUDevice.cpp:175 rather than falling back to fp32.

In practice this mirrors the existing TimestampQuery pattern and the WebGPU contract says an adapter-advertised feature should be grantable, so this is acceptable and consistent. Just noting the "fail-open" wording is slightly broader than the actual guarantee — the fallback only covers the unsupported-by-adapter case, not a device-rejected case.

Overengineering — none 👍

The change resists scope creep well: it doesn't add a fp16 kernel consumer, doesn't introduce config plumbing, and keeps the flag as a plain bool. This is the right altitude for an enablement PR.

Modularization — optional, not required

The two feature blocks now share an identical shape:

if (wgpuAdapterHasFeature(ctx.adapter, WGPUFeatureName_X)) {
  required_features.push_back(WGPUFeatureName_X);
  ctx.<flag> = true;
}

A helper like try_enable_feature(ctx.adapter, required_features, WGPUFeatureName_ShaderF16, ctx.shader_f16_supported) would dedupe it. I'd lean against it here — per this repo's CLAUDE.md ("no trivial single-use helpers unless significantly improving readability"), and because one of the two call sites is #ifdef-gated (which a helper wouldn't cleanly absorb). Two occurrences of a 3-line pattern isn't enough duplication to justify the indirection. Worth revisiting only if a third optional feature lands.

Minor


Verdict: Correct, appropriately minimal, no blocking issues. The "fail-open" comment wording is the only thing I'd tweak for precision.
· branch gh/JCNTH/3/head

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant