Use provider C# reference map with pre-write internalization#10976
Use provider C# reference map with pre-write internalization#10976live1206 wants to merge 7 commits into
Conversation
commit: |
|
No changes needing a change description found. |
|
You can try these changes here
|
|
|
||
| internal void PreserveXmlDocs() | ||
| { | ||
| PreserveTypeXmlDocs = true; |
There was a problem hiding this comment.
why not set this in the ctor instead ?
There was a problem hiding this comment.
I kept this as an explicit post-analysis marker rather than setting it in the ctor. The ctor does not know whether a provider will be internalized; that depends on the full provider graph plus custom/API roots computed later in ApplyPreWriteAccessibility. Defaulting this in the ctor would preserve docs for types that do not need the pre-write internalization compatibility path.
| private const string SerializationFormatFileName = "SerializationFormat.cs"; | ||
| private const string TypeFormattersFileName = "TypeFormatters.cs"; | ||
|
|
||
| private static readonly string[] _filesToKeep = |
There was a problem hiding this comment.
can we somehow instead rely on _additionalRootTypes to prevent certain types / files from being deleted? My concern is that we will have to maintain an additional list and these generated file names could change. I think this could also help solve the other comment above.
There was a problem hiding this comment.
Addressed in cc036f3. CSharpGen now builds the delete-preserve list from CodeModelGenerator.Instance.AdditionalRootTypes and NonRootTypes, converting registered type names to generated file names, so language-specific generators can opt in via AddTypeToKeep instead of MTG maintaining helper filenames.
JoshLove-msft
left a comment
There was a problem hiding this comment.
Review — provider C# reference map with pre-write internalization
Overall: Ambitious and well-motivated performance work (~21% faster full-gen, ~62% on the reference-map path), with strong Azure SDK validation and no reported output diffs. The dual-path design (fall back to Roslyn when no precomputed result matches) is a sensible safety net. My concerns are about architecture layering, correctness edge cases, and maintainability, not the goal.
🔴 High — Emitter-specific knowledge hardcoded into the generic base layer
Microsoft.TypeSpec.Generator is the emitter-agnostic core, yet this PR bakes ClientModel/SCM-specific names into it:
CSharpGen.cs:RawRequestUriBuilderExtensions.cs,SerializationFormat.cs,TypeFormatters.csin_filesToKeep; plus a.SetDelimited(/RequestHeaderExtensions.cstextual skip.ProviderReferenceMapAnalyzer.cs:RemoveUnusedRequestHeaderExtensionsRoot,HasCustomRequestHeaderExtensionsReference("SetDelimited","RequestHeaderExtensions"),IsGeneratedHelperSimpleName(ChangeTrackingDictionary/List), and PersistableModelProxy handling.
Root cause: these helpers are emitted as raw source files, not TypeProviders, so the provider graph can't see them — forcing name-based special cases. The old Roslyn map was fully generic. Consider an extensibility hook (e.g. emitters register "always-keep" files / helper roots) rather than literal strings in the core.
🔴 High — ~120 lines of near-duplicated candidate logic
Analyze and GetPreWriteAccessibilityCandidates reimplement the same intricate multi-pass reachability / internalize / publicize computation with subtle differences. This is the most complex logic in the generator, duplicated — every future bug fix must be applied in two places and they will drift. Please extract a shared helper.
🟡 Medium — Possible correctness gap: RequestHeaderExtensions.cs skip ignores custom code
In CSharpGen, usesRequestHeaderExtensions scans only generatedFiles (from GetGeneratedFilesAsync) for .SetDelimited(. If custom (non-generated) code calls SetDelimited / RequestHeaderExtensions but no generated file does, the file is dropped → custom code breaks. Note the analyzer's HasCustomRequestHeaderExtensionsReference does check custom docs, so the two mechanisms are inconsistent. Worth a test with custom-only usage.
🟡 Medium — Process-wide static mutable state
ProviderReferenceMapAnalyzer holds static _latestResult, _preWriteModelFactory, PreWriteAccessibilityApplied, coupling PostProcessor/CSharpGen via globals rather than passing the result through. The ProjectId guard and Reset mitigate staleness, and tests currently aren't [Parallelizable] — but this is fragile: enabling test parallelism later would introduce races. Prefer threading the result explicitly (or storing it on the GeneratedCodeWorkspace/Configuration instance).
🟡 Medium — AddSampleSymbols only on the fallback path
The Roslyn RemoveAsync path calls AddSampleSymbols to protect sample-referenced types; the new provider path does not. Please confirm the provider graph roots samples equivalently, or unreferenced-except-by-samples types could be removed.
🔵 Minor / questions
- Sync-over-async (
.GetAwaiter().GetResult()) on Roslyn calls throughout the analyzer — fine here (no sync context) but inconsistent with the async codebase. IsSerializationProvider/IsGeneratedDocumentrely on file-path string suffixes (.Serialization.cs,/Generated/) — brittle but pre-existing style.- Some unrelated formatting-only churn (brace additions) is mixed in, enlarging the diff.
- Benchmark artifacts are all local
/tmp/...paths; a CI-captured number would be more reproducible for reviewers, though the Azure SDK PR validation is convincing.
✅ Strengths
- Real, verified performance win with end-to-end Azure regen showing no code/API diffs.
- Keeps Roslyn as a fallback and for custom-code analysis — good risk management.
- Solid new test coverage (
ClientBodyDependencyPostProcessingTests, PostProcessor / ModelFactory tests) locking in accessibility behavior. - XML-doc preservation and model-factory-method pruning for pre-write internalization are handled thoughtfully.
Recommendation: approve-with-changes. The performance value is real, but I'd want (1) the duplicated candidate logic factored out and (2) the custom-code SetDelimited skip gap resolved/tested before merge; the layering hardcoding is the strategic follow-up to track even if not blocking.
--generated by Copilot
|
Addressed the duplicated accessibility-candidate logic in 2b2c999. |
|
Followed up on the remaining latest review concerns in 69cb3d2:
|
|
Follow-up pushed in 762a02b for the provider/file-name concern:
I also scanned the analyzer afterward; it no longer has |
|
Follow-up pushed in dc5b534 for helper dependencies:
I also scanned the touched files afterward; the old helper dependency string API and hardcoded helper dependency string returns are gone. |
JoshLove-msft
left a comment
There was a problem hiding this comment.
Follow-up review (commits cc036f3f…762a02b2)
Thanks for the fast turnaround — this round addresses most of my previous feedback well. I re-reviewed the four new commits and built the generator locally (Microsoft.TypeSpec.Generator.ClientModel, 0 warnings / 0 errors, StyleCop clean).
✅ Resolved
- Layering (was 🔴): The biggest improvement. Emitter-specific string checks are largely gone:
_filesToKeepno longer hardcodes helper file names;GetFilesToKeep()derives them generically fromAddTypeToKeep(AdditionalRootTypes+NonRootTypes). And sinceTypeFormatters/SerializationFormat/RawRequestUriBuilderExtensionsare actualTypeProviders, the provider graph already governs them — so the hardcoding was correctly unnecessary.- New virtual markers
IsClientProvider/IncludeGeneratedBodyReferences(overridden inClientProvider/ClientUriBuilderDefinition) replace the brittleRelativeFilePath.EndsWith("Client.cs"),"/Internal/ClientUriBuilder.cs",PackageName.StartsWith("Azure."), and"ModelFactory.cs"/".Serialization.cs"path checks. This is exactly the polymorphic hook I was hoping for.
- Duplication (was 🔴):
AnalyzeandGetPreWriteAccessibilityCandidatesnow shareGetAccessibilityCandidates(...). The ~60-line copy is gone. 👍 - Custom-code
SetDelimitedgap (was 🟡):CSharpGennow also consultsHasCustomRequestHeaderExtensionsReference(), and there's a new test. Good.
🟡 Remaining (minor)
RequestHeaderExtensions/SetDelimitedis the last name-based special-case in the MTG core (CSharpGen.IsRequestHeadersExtensionsFile, analyzerIsRequestHeadersExtensionsRoot/HasCustomRequestHeaderExtensionsReference). Understandable, since this helper is emitted as a raw static file rather than aTypeProvider, so there's no provider to mark. Not blocking, but worth a// TODOnoting it's the one helper the provider graph can't see, so future readers know why it's special.- Static mutable state (
_latestResult,_preWriteModelFactory,PreWriteAccessibilityApplied) is unchanged. Fine today (tests aren't[Parallelizable], and theProjectIdguard/Resetmitigate), but it's still hidden global coupling that would break under test parallelism. Low priority.
🔵 Questions / verify
- Samples on the provider remove path:
AddSampleSymbolsstill only runs on the Roslyn fallback branch; the provider branch derivesreferencedSet = DeclaredSymbols − RemoveCandidates. Since sample files are generated (not custom) and aren't provider nodes, types referenced only by samples wouldn't be rooted. Your full Azure data-plane regen (no diffs) strongly suggests this is fine in practice — just confirming it's intentional and not silently relying on samples never being provider candidates. GetFileNameForTypeassumes a keep-type's file is exactly<SimpleName>.cs. Holds for the current helpers, but it's an implicit filename↔type coupling that would silently miss a keep-type emitted to a differently-named file. A one-line comment noting the assumption would help.ShouldUseGeneratedSourceReferencesbroadened from Azure-only +Client.csto anyIsClientProviderwith noBodyDependencyTypes. This looks like a deliberate (good) generalization to unbranded clients — flagging only so it's a conscious behavior change covered by the regen runs.
Nice work — the polymorphic marker approach materially improves the design. From my side these are all non-blocking; happy to see it merge once (3) is confirmed.
--generated by Copilot
|
Correction to my follow-up review (point #1). @live1206 rightly pointed out that The real reason the textual So the same marker mechanism you introduced for --generated by Copilot |
|
Addressed in 0a170d3. |
|
Follow-up on the remaining samples question from the latest review: the provider remove path only computes removal candidates from generated provider graph nodes, not from arbitrary project symbols. Sample files are generated documents but they are not provider nodes, so sample classes themselves are not selected as remove candidates by the provider graph. For generated types referenced only by samples, the full DPG regen/no-diff validation is the practical coverage; I do not see an additional code change needed here. The remaining static-state concern is still a known low-priority follow-up rather than a blocker for this PR. |
Design suggestion: derive body references from expression trees rather than declared deps + source re-parsingFollowing up on the body-reference discussion — the current approach builds graph edges from provider metadata (signatures, properties, fields, attributes, etc.) and then compensates for body-level references three ways: manually-declared The concern is soundness: a pure-metadata graph is an under-approximation of reachability, because a C# type reference can appear anywhere a method body can ( Suggestion: walk the body expression trees instead. The bodies already exist as structured
The fact that the PR re-parses generated source for the client/serialization candidates (rather than walking expression trees) suggests the tree walk wasn't feasible for all bodies — presumably where bodies are raw Not a merge blocker, but I think it's the change that would turn this from "correct-by-testing" into "correct-by-construction." Would also be worth attributing how much of the ~21% comes from skipping body analysis vs. from pre-write internalization (skipping the Roslyn --generated by Copilot |
|
|
||
| foreach (var outputType in output.TypeProviders) | ||
| { | ||
| if (outputType is ModelFactoryProvider && outputType.Methods.Count == 0) |
There was a problem hiding this comment.
why is this only needed for model factories ?
There was a problem hiding this comment.
Kept this one intentionally: model factory methods can be pruned before writing, and if all methods are removed we should avoid writing an empty factory. Other providers are skipped through ShouldWriteProvider rather than method pruning.
| var generatedPublicReachable = GetReachableTypes(internalizeRoots, internalizeReferences); | ||
| AddDerivedModelReferences(providers, publicGraph.Nodes, internalizeReferences, generatedPublicReachable, generatedDiscriminatorBaseNames); | ||
| internalizeRoots.UnionWith(customPublicRoots); | ||
| var internalizeReachableWithoutHelpers = GetReachableTypes(internalizeRoots, internalizeReferences); | ||
| AddDerivedModelReferences(providers, publicGraph.Nodes, internalizeReferences, internalizeReachableWithoutHelpers, generatedDiscriminatorBaseNames); | ||
| internalizeReachableWithoutHelpers = GetReachableTypes(internalizeRoots, internalizeReferences); | ||
| var publicizeRoots = internalizeRoots.ToHashSet(StringComparer.Ordinal); | ||
| var internalizeHelperRoots = GetHelperRootNames(generatedProviders, graph.Nodes, internalizeReachableWithoutHelpers); | ||
| internalizeRoots.UnionWith(internalizeHelperRoots); | ||
| var internalizeReachable = GetReachableTypes(internalizeRoots, internalizeReferences); | ||
| var internalizeDeclaredNodes = GetPostProcessorDeclaredNodes(generatedProviders, graph.Nodes, publicOnly: true); | ||
| var customInternalBoundaryNodes = graph.Nodes | ||
| .Where(name => publicGraph.References.TryGetValue(name, out var references) && references.Overlaps(customInternalDeclarations)) | ||
| .ToHashSet(StringComparer.Ordinal); | ||
| var publicizeDeclaredNodes = GetPostProcessorDeclaredNodes(generatedProviders, graph.Nodes, publicOnly: false) | ||
| .Except(internalizeDeclaredNodes, StringComparer.Ordinal); | ||
| var generatedImplementationInternalDeclarations = GetGeneratedImplementationInternalTypeDeclarations(generatedInternalDeclarations).ToHashSet(StringComparer.Ordinal); | ||
| var publicApiTraversalNodes = internalizeDeclaredNodes | ||
| .Except(generatedInternalDeclarations, StringComparer.Ordinal) | ||
| .Concat(publicizeDeclaredNodes) | ||
| .Except(generatedImplementationInternalDeclarations, StringComparer.Ordinal) | ||
| .ToHashSet(StringComparer.Ordinal); |
There was a problem hiding this comment.
I will be honest and say this is quite difficult to follow
There was a problem hiding this comment.
Agreed, this needs to be made more readable
There was a problem hiding this comment.
Refactored Analyze in f5d83b7 so it now delegates to named accessibility/removal candidate builders instead of keeping both flows inline.
There was a problem hiding this comment.
Refactored this in f5d83b7 by splitting the dense Analyze flow into named accessibility and removal candidate phases.
There was a problem hiding this comment.
This file is still massive, I will refactor it into smaller structured pieces with less responsibility each.
9951f4b to
22adf62
Compare
|
Re-reviewed the latest rewrite ( 1.
|
22adf62 to
ac2c0c1
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ac2c0c1 to
d3e60ef
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Replaces the C# generator's Roslyn reference-map/internalization path with a provider/expression-based reference map and pre-write provider accessibility updates while preserving generated-output correctness.
Latest changes
TypeProvidermetadata and provider expression/body dependency traversal instead of constructing the generated reference map through Roslyn.InternalizeAsyncpass when pre-write accessibility has already been applied; removal and reduce still run through the existing post-processing pipeline.NamedTypeSymbolProviders, using lightweight metadata identity and syntax dependency extraction so arbitrary custom Roslyn symbols do not need fullCSharpType/member materialization.How provider reference-map construction works
This PR no longer uses Roslyn to construct the generated-code reference map. Instead, generated type reachability is derived from the in-memory provider model before files are written.
TypeProviders first, then run visitors, customized-member filtering, and back-compat processing so the analyzer sees the final provider shape.InternalizeAsyncis skipped because the generated files already have the final accessibility.Roslyn is still used by the existing post-processing pipeline for removal/reduce work and for reading custom-code symbols/syntax, but not for generated reference-map construction or applying internalization.
Latest benchmark data
Latest local full-generation benchmark compares Roslyn as the baseline against the current provider implementation with pre-write internalization and the latest custom-code/state-cleanup fixes. The values below are aggregated means across 3 BenchmarkDotNet runs with profiling disabled.
Per-run results:
Benchmark artifacts:
/tmp/typespec-provider-map-latest-20260701-0828-run1/tmp/typespec-provider-map-latest-20260701-0828-run2/tmp/typespec-provider-map-latest-20260701-0828-run3/tmp/typespec-provider-map-no-profile-20260630-1547-run1/tmp/typespec-provider-map-no-profile-20260630-1547-run2/tmp/typespec-provider-map-no-profile-20260630-1547-run3/tmp/typespec-hybrid-benchmark-reruns-20260629-065344/tmp/typespec-prewrite-internalize-benchmark-20260629-073203Azure SDK local project-reference benchmark
Measured Azure SDK for .NET management-plane regeneration using the same local project-reference setup in both runs:
/workspaces/azure-sdk-for-netwas wired to the sibling/workspaces/typespeccheckout via .NET project references, and only the TypeSpec checkout changed betweenmainand this PR branch.Validation
RegenPreview.ps1 -Azurescript regenerated 39 Azure-branded data-plane libraries with no SDK code diffs and no public API diffs.Azure.Data.AppConfiguration(*.nuget.g.propsalready existed`).Azure.Data.AppConfigurationalone with the same local generator passed, leaving no SDK/API diffs.Azure.AI.Vision.ImageAnalysispassed after the custom-code stack-overflow fix.System/Azureusings are removed from the model factory, and internal error models are generated/marked buildable in the MRW context.npm run formatpasses.pnpm formatpasses; unrelated formatter changes outsidehttp-client-csharpwere discarded.Microsoft.TypeSpec.Generator.ClientModel.Testspass (1451tests).npm run build -- --no-restorepasses.Microsoft.TypeSpec.Generator.Testscustomization/post-processing tests pass (ClientCustomizationTests,PostProcessorTests.RemovesInvalidUsings).npm run coppasses.