[ExecuTorch][WebGPU] Request shader-f16 device feature (enablement, fail-open) by JCNTH · Pull Request #20729 · pytorch/executorch

JCNTH · 2026-07-05T01:19:56Z

Stack from ghstack (oldest at bottom):

Enablement for a future fp16 compute path: detect and request WGPUFeatureName_ShaderF16 at device creation when the adapter advertises it (fail-open — absence leaves the fp32 path unchanged), and expose WebGPUContext::shader_f16_supported for fp16 kernels to branch on. No fp16 hot-kernel consumer yet.

Key changes:

WebGPUDevice.h — add bool shader_f16_supported to WebGPUContext (outside the profiling #ifdef).
WebGPUDevice.cpp — hoist required_features out of the profiling #ifdef; push WGPUFeatureName_ShaderF16 when the adapter has it; assign requiredFeatures/requiredFeatureCount only when non-empty; move #include <vector> to the unconditional include block.
@exported-using-ghexport

Differential Revision: D110660013

[ghstack-poisoned]

pytorch-bot · 2026-07-05T01:19:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20729

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 21 Pending

As of commit f6ff099 with merge base 4af91c3 ():

NEW FAILURE - The following job has failed:

pull / test-binary-size-linux-gcc / linux-job (gh)
RuntimeError: Command docker exec -t c99b1f7814b94b992c039b03cd3ecef632e024dea643f086df7809326a20d1ee /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2026-07-05T01:20:01Z

The committers listed above are authorized under a signed CLA.

✅ login: JCNTH / name: Julian Ng-Thow-Hing (0dfaa10, aa8f4a1)

github-actions · 2026-07-05T01:21:07Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

JCNTH · 2026-07-05T01:22:55Z

@claude throoughly review, check for overengineering, correctness, and areaas for modularization

claude · 2026-07-05T01:23:43Z

Claude finished @JCNTH's task in 51s —— View job

Code Review: shader-f16 device feature enablement

Read PR diff and both changed files
Review correctness, overengineering, and modularization

Overall this is a clean, well-scoped enablement change. The lifetime handling is correct, the fail-open semantics match the existing TimestampQuery path, and the diff is appropriately minimal. A few observations below.

Correctness ✅

Vector lifetime is sound. device_desc.requiredFeatures is assigned required_features.data() and the vector stays in scope through wgpuAdapterRequestDevice at WebGPUDevice.cpp:168. The inline comment (WebGPUDevice.cpp:146) documenting this is genuinely useful — exactly the kind of non-obvious context worth a comment.
#include <vector> unconditional (WebGPUDevice.cpp:16) is now correct — required_features is used regardless of WGPU_BACKEND_ENABLE_PROFILING. Previously guarding it under the profiling #ifdef would have broken the non-profiling build once the vector was hoisted, so this move is necessary, not incidental.
Non-empty guard (WebGPUDevice.cpp:159) is fine. Setting requiredFeatureCount = 0 with a null/empty data() would also be valid per the WebGPU descriptor contract, so the guard is mildly redundant — but it reads clearly and avoids handing a possibly-null data() pointer to the descriptor, so I'd keep it.

One semantic nuance worth flagging ⚠️

The comment says "fail-open (absence just disables the corresponding fast path)", but ShaderF16 is pushed into requiredFeatures — a required device feature. It's only fail-open with respect to the adapter not advertising the feature (the wgpuAdapterHasFeature gate). If an adapter advertises ShaderF16 but device creation then rejects it, the whole context creation throws at WebGPUDevice.cpp:175 rather than falling back to fp32.

In practice this mirrors the existing TimestampQuery pattern and the WebGPU contract says an adapter-advertised feature should be grantable, so this is acceptable and consistent. Just noting the "fail-open" wording is slightly broader than the actual guarantee — the fallback only covers the unsupported-by-adapter case, not a device-rejected case.

Overengineering — none 👍

The change resists scope creep well: it doesn't add a fp16 kernel consumer, doesn't introduce config plumbing, and keeps the flag as a plain bool. This is the right altitude for an enablement PR.

Modularization — optional, not required

The two feature blocks now share an identical shape:

if (wgpuAdapterHasFeature(ctx.adapter, WGPUFeatureName_X)) {
  required_features.push_back(WGPUFeatureName_X);
  ctx.<flag> = true;
}

A helper like try_enable_feature(ctx.adapter, required_features, WGPUFeatureName_ShaderF16, ctx.shader_f16_supported) would dedupe it. I'd lean against it here — per this repo's CLAUDE.md ("no trivial single-use helpers unless significantly improving readability"), and because one of the two call sites is #ifdef-gated (which a helper wouldn't cleanly absorb). Two occurrences of a 3-line pattern isn't enough duplication to justify the indirection. Worth revisiting only if a third optional feature lands.

Minor

The header comment (WebGPUDevice.h:28-30) describes the fp16 storage/compute path "in hot kernels," which don't exist yet per the PR description. That's fine as forward-looking documentation, but if the consumer PRs ([ExecuTorch][WebGPU] Add 256-thread "steel" q4gsw prefill GEMM #20730/[ExecuTorch][WebGPU] Steel q4gsw prefill GEMM — test coverage #20731) slip, consider softening to "reserved for a future fp16 path" so the comment doesn't over-promise current behavior.

Verdict: Correct, appropriately minimal, no blocking issues. The "fail-open" comment wording is the only thing I'd tweak for precision.
· branch gh/JCNTH/3/head

[ghstack-poisoned]

Update

0dfaa10

[ghstack-poisoned]

JCNTH temporarily deployed to cadence July 5, 2026 01:20 — with GitHub Actions Inactive

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 5, 2026

JCNTH temporarily deployed to cadence July 5, 2026 01:48 — with GitHub Actions Inactive

Update

aa8f4a1

[ghstack-poisoned]

JCNTH temporarily deployed to cadence July 5, 2026 02:27 — with GitHub Actions Inactive

meta-codesync Bot added the meta-exported label Jul 5, 2026

JCNTH temporarily deployed to cadence July 5, 2026 02:55 — with GitHub Actions Inactive

Update

ebb5a7e

[ghstack-poisoned]

JCNTH temporarily deployed to cadence July 5, 2026 04:31 — with GitHub Actions Inactive

JCNTH temporarily deployed to cadence July 5, 2026 04:58 — with GitHub Actions Inactive

Update

f6ff099

[ghstack-poisoned]

JCNTH temporarily deployed to cadence July 5, 2026 06:18 — with GitHub Actions Inactive

JCNTH deployed to cadence July 5, 2026 06:46 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ExecuTorch][WebGPU] Request shader-f16 device feature (enablement, fail-open)#20729

[ExecuTorch][WebGPU] Request shader-f16 device feature (enablement, fail-open)#20729
JCNTH wants to merge 4 commits into
gh/JCNTH/3/basefrom
gh/JCNTH/3/head

JCNTH commented Jul 5, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jul 5, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla Bot commented Jul 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 5, 2026

Uh oh!

JCNTH commented Jul 5, 2026

Uh oh!

claude Bot commented Jul 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JCNTH commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20729

❌ 1 New Failure, 21 Pending

Uh oh!

linux-foundation-easycla Bot commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jul 5, 2026

This PR needs a release notes: label

Uh oh!

JCNTH commented Jul 5, 2026

Uh oh!

claude Bot commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: shader-f16 device feature enablement

Correctness ✅

One semantic nuance worth flagging ⚠️

Overengineering — none 👍

Modularization — optional, not required

Minor

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JCNTH commented Jul 5, 2026 •

edited

Loading

pytorch-bot Bot commented Jul 5, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jul 5, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jul 5, 2026 •

edited

Loading