Skip to content

Use provider C# reference map with pre-write internalization#10976

Draft
live1206 wants to merge 7 commits into
microsoft:mainfrom
live1206:mtg-hybrid-reference-map
Draft

Use provider C# reference map with pre-write internalization#10976
live1206 wants to merge 7 commits into
microsoft:mainfrom
live1206:mtg-hybrid-reference-map

Conversation

@live1206

@live1206 live1206 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces the C# generator's Roslyn reference-map/internalization path with a provider/expression-based reference map and pre-write provider accessibility updates while preserving generated-output correctness.

Latest changes

  • Builds generated-code reachability from TypeProvider metadata and provider expression/body dependency traversal instead of constructing the generated reference map through Roslyn.
  • Fully removes the generated-source Roslyn reference-map fallback for generated code; generated dependencies now come from provider metadata, expression trees, helper/body/signature dependency lists, and explicit provider relationships.
  • Applies internal/public accessibility before writing generated files by updating provider modifiers directly.
  • Skips the later Roslyn InternalizeAsync pass when pre-write accessibility has already been applied; removal and reduce still run through the existing post-processing pipeline.
  • Preserves XML docs for providers that are made internal before writing.
  • Removes model-factory methods for models that are internalized pre-write.
  • Handles custom-code roots through provider custom views plus customization-wide NamedTypeSymbolProviders, using lightweight metadata identity and syntax dependency extraction so arbitrary custom Roslyn symbols do not need full CSharpType/member materialization.
  • Keeps request-header extension helpers reachable from both generated expressions and custom-code syntax references.

How provider reference-map construction works

This PR no longer uses Roslyn to construct the generated-code reference map. Instead, generated type reachability is derived from the in-memory provider model before files are written.

  1. Build all TypeProviders first, then run visitors, customized-member filtering, and back-compat processing so the analyzer sees the final provider shape.
  2. Flatten generated providers into graph nodes, including top-level providers, nested providers, and serialization providers. Nodes are keyed by fully-qualified generated type name.
  3. Add graph edges from provider metadata: provider type, base type, declaring type, implemented interfaces, nested/serialization providers, properties, fields, constructors, method signatures, attributes, generic arguments, arrays, collection definitions, helper/body/signature dependencies, and structured provider expression/body references.
  4. Build both a public-signature graph and a full generated graph. The public graph decides what must remain public; the full graph decides what generated files/helpers are reachable for removal.
  5. Seed reachability from known roots: public clients, additional configured roots, API-baseline generated types, custom-code references, generated public declarations, discriminator/derived model relationships, union variants when needed, and helper roots discovered after an initial reachability pass.
  6. Compute candidates from reachability: unreachable public declarations become internalize candidates, internal/generated declarations needed by public API become publicize candidates, and unreachable full-graph declarations become remove candidates.
  7. Apply accessibility before writing by updating provider modifiers directly. Model-factory methods for internalized/removed models are filtered for output, XML docs are preserved for types made internal, and Roslyn InternalizeAsync is skipped because the generated files already have the final accessibility.

Roslyn is still used by the existing post-processing pipeline for removal/reduce work and for reading custom-code symbols/syntax, but not for generated reference-map construction or applying internalization.

Latest benchmark data

Latest local full-generation benchmark compares Roslyn as the baseline against the current provider implementation with pre-write internalization and the latest custom-code/state-cleanup fixes. The values below are aggregated means across 3 BenchmarkDotNet runs with profiling disabled.

Benchmark Roslyn baseline Provider reference map Improvement vs Roslyn
Full generation mean 855.0 ms / 68.43 MB 558.9 ms / 44.57 MB 34.6% faster, 34.9% less allocation

Per-run results:

Run Roslyn baseline Provider reference map Improvement
Run 1 835.5 ms / 68.52 MB 582.3 ms / 44.56 MB 30.3% faster, 35.0% less allocation
Run 2 883.2 ms / 68.16 MB 542.4 ms / 44.49 MB 38.6% faster, 34.7% less allocation
Run 3 846.4 ms / 68.60 MB 552.0 ms / 44.67 MB 34.8% faster, 34.9% less allocation

Benchmark artifacts:

Run Artifact
Latest no-profile full-generation run 1 /tmp/typespec-provider-map-latest-20260701-0828-run1
Latest no-profile full-generation run 2 /tmp/typespec-provider-map-latest-20260701-0828-run2
Latest no-profile full-generation run 3 /tmp/typespec-provider-map-latest-20260701-0828-run3
Previous no-profile full-generation run 1 /tmp/typespec-provider-map-no-profile-20260630-1547-run1
Previous no-profile full-generation run 2 /tmp/typespec-provider-map-no-profile-20260630-1547-run2
Previous no-profile full-generation run 3 /tmp/typespec-provider-map-no-profile-20260630-1547-run3
Previous provider reference-map reruns /tmp/typespec-hybrid-benchmark-reruns-20260629-065344
Pre-write internalization /tmp/typespec-prewrite-internalize-benchmark-20260629-073203

Azure SDK local project-reference benchmark

Measured Azure SDK for .NET management-plane regeneration using the same local project-reference setup in both runs: /workspaces/azure-sdk-for-net was wired to the sibling /workspaces/typespec checkout via .NET project references, and only the TypeSpec checkout changed between main and this PR branch.

SDK package TypeSpec main baseline This PR branch Improvement
Azure.ResourceManager.Network 00:37:01.9 00:07:12.1 80.6% faster / 5.14x speedup
Azure.ResourceManager.DataFactory 00:13:43.4 00:03:33.9 74.0% faster / 3.85x speedup
Azure.ResourceManager.AppService 00:10:46.9 00:03:06.5 71.2% faster / 3.47x speedup

Validation

  • Latest Azure SDK full regen PR: [DO NOT MERGE] Preview Generator Version 1.0.0-alpha.20260625.1 (Azure data plane) Azure/azure-sdk-for-net#60254.
  • Local full Azure data-plane regen with the MTG RegenPreview.ps1 -Azure script regenerated 39 Azure-branded data-plane libraries with no SDK code diffs and no public API diffs.
    • The full parallel run passed 38/39 libraries and hit a transient NuGet restore race in Azure.Data.AppConfiguration (*.nuget.g.props already existed`).
    • Rerunning Azure.Data.AppConfiguration alone with the same local generator passed, leaving no SDK/API diffs.
  • Targeted local Azure data-plane regen for Azure.AI.Vision.ImageAnalysis passed after the custom-code stack-overflow fix.
    • The remaining SDK changes are expected generated cleanup/reachability diffs: unused System/Azure usings are removed from the model factory, and internal error models are generated/marked buildable in the MRW context.
  • C# emitter npm run format passes.
  • Repo-root pnpm format passes; unrelated formatter changes outside http-client-csharp were discarded.
  • Full Microsoft.TypeSpec.Generator.ClientModel.Tests pass (1451 tests).
  • C# emitter npm run build -- --no-restore passes.
  • Focused Microsoft.TypeSpec.Generator.Tests customization/post-processing tests pass (ClientCustomizationTests, PostProcessorTests.RemovesInvalidUsings).
  • C# emitter npm run cop passes.

@microsoft-github-policy-service microsoft-github-policy-service Bot added the emitter:client:csharp Issue for the C# client emitter: @typespec/http-client-csharp label Jun 12, 2026
@pkg-pr-new

pkg-pr-new Bot commented Jun 12, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/@typespec/http-client-csharp@10976

commit: 1c1c0b7

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

No changes needing a change description found.

@azure-sdk-automation

azure-sdk-automation Bot commented Jun 19, 2026

Copy link
Copy Markdown

You can try these changes here

🛝 Playground 🌐 Website 🛝 VSCode Extension

@live1206 live1206 marked this pull request as ready for review June 25, 2026 08:58
@live1206 live1206 changed the title [Draft] Use hybrid C# reference map for post-processing Use hybrid C# reference map for post-processing Jun 25, 2026
@live1206 live1206 changed the title Use hybrid C# reference map for post-processing Use hybrid C# reference map with pre-write internalization Jun 29, 2026
@live1206 live1206 changed the title Use hybrid C# reference map with pre-write internalization Use provider C# reference map with pre-write internalization Jun 29, 2026

internal void PreserveXmlDocs()
{
PreserveTypeXmlDocs = true;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not set this in the ctor instead ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this as an explicit post-analysis marker rather than setting it in the ctor. The ctor does not know whether a provider will be internalized; that depends on the full provider graph plus custom/API roots computed later in ApplyPreWriteAccessibility. Defaulting this in the ctor would preserve docs for types that do not need the pre-write internalization compatibility path.

private const string SerializationFormatFileName = "SerializationFormat.cs";
private const string TypeFormattersFileName = "TypeFormatters.cs";

private static readonly string[] _filesToKeep =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we somehow instead rely on _additionalRootTypes to prevent certain types / files from being deleted? My concern is that we will have to maintain an additional list and these generated file names could change. I think this could also help solve the other comment above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in cc036f3. CSharpGen now builds the delete-preserve list from CodeModelGenerator.Instance.AdditionalRootTypes and NonRootTypes, converting registered type names to generated file names, so language-specific generators can opt in via AddTypeToKeep instead of MTG maintaining helper filenames.

@JoshLove-msft JoshLove-msft left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — provider C# reference map with pre-write internalization

Overall: Ambitious and well-motivated performance work (~21% faster full-gen, ~62% on the reference-map path), with strong Azure SDK validation and no reported output diffs. The dual-path design (fall back to Roslyn when no precomputed result matches) is a sensible safety net. My concerns are about architecture layering, correctness edge cases, and maintainability, not the goal.

🔴 High — Emitter-specific knowledge hardcoded into the generic base layer

Microsoft.TypeSpec.Generator is the emitter-agnostic core, yet this PR bakes ClientModel/SCM-specific names into it:

  • CSharpGen.cs: RawRequestUriBuilderExtensions.cs, SerializationFormat.cs, TypeFormatters.cs in _filesToKeep; plus a .SetDelimited( / RequestHeaderExtensions.cs textual skip.
  • ProviderReferenceMapAnalyzer.cs: RemoveUnusedRequestHeaderExtensionsRoot, HasCustomRequestHeaderExtensionsReference ("SetDelimited", "RequestHeaderExtensions"), IsGeneratedHelperSimpleName (ChangeTrackingDictionary/List), and PersistableModelProxy handling.

Root cause: these helpers are emitted as raw source files, not TypeProviders, so the provider graph can't see them — forcing name-based special cases. The old Roslyn map was fully generic. Consider an extensibility hook (e.g. emitters register "always-keep" files / helper roots) rather than literal strings in the core.

🔴 High — ~120 lines of near-duplicated candidate logic

Analyze and GetPreWriteAccessibilityCandidates reimplement the same intricate multi-pass reachability / internalize / publicize computation with subtle differences. This is the most complex logic in the generator, duplicated — every future bug fix must be applied in two places and they will drift. Please extract a shared helper.

🟡 Medium — Possible correctness gap: RequestHeaderExtensions.cs skip ignores custom code

In CSharpGen, usesRequestHeaderExtensions scans only generatedFiles (from GetGeneratedFilesAsync) for .SetDelimited(. If custom (non-generated) code calls SetDelimited / RequestHeaderExtensions but no generated file does, the file is dropped → custom code breaks. Note the analyzer's HasCustomRequestHeaderExtensionsReference does check custom docs, so the two mechanisms are inconsistent. Worth a test with custom-only usage.

🟡 Medium — Process-wide static mutable state

ProviderReferenceMapAnalyzer holds static _latestResult, _preWriteModelFactory, PreWriteAccessibilityApplied, coupling PostProcessor/CSharpGen via globals rather than passing the result through. The ProjectId guard and Reset mitigate staleness, and tests currently aren't [Parallelizable] — but this is fragile: enabling test parallelism later would introduce races. Prefer threading the result explicitly (or storing it on the GeneratedCodeWorkspace/Configuration instance).

🟡 Medium — AddSampleSymbols only on the fallback path

The Roslyn RemoveAsync path calls AddSampleSymbols to protect sample-referenced types; the new provider path does not. Please confirm the provider graph roots samples equivalently, or unreferenced-except-by-samples types could be removed.

🔵 Minor / questions

  • Sync-over-async (.GetAwaiter().GetResult()) on Roslyn calls throughout the analyzer — fine here (no sync context) but inconsistent with the async codebase.
  • IsSerializationProvider / IsGeneratedDocument rely on file-path string suffixes (.Serialization.cs, /Generated/) — brittle but pre-existing style.
  • Some unrelated formatting-only churn (brace additions) is mixed in, enlarging the diff.
  • Benchmark artifacts are all local /tmp/... paths; a CI-captured number would be more reproducible for reviewers, though the Azure SDK PR validation is convincing.

✅ Strengths

  • Real, verified performance win with end-to-end Azure regen showing no code/API diffs.
  • Keeps Roslyn as a fallback and for custom-code analysis — good risk management.
  • Solid new test coverage (ClientBodyDependencyPostProcessingTests, PostProcessor / ModelFactory tests) locking in accessibility behavior.
  • XML-doc preservation and model-factory-method pruning for pre-write internalization are handled thoughtfully.

Recommendation: approve-with-changes. The performance value is real, but I'd want (1) the duplicated candidate logic factored out and (2) the custom-code SetDelimited skip gap resolved/tested before merge; the layering hardcoding is the strategic follow-up to track even if not blocking.

--generated by Copilot

@live1206

live1206 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the duplicated accessibility-candidate logic in 2b2c999. Analyze and GetPreWriteAccessibilityCandidates now share GetAccessibilityCandidates(...); the two callers only differ in how they collect custom/API roots before invoking the shared reachability/publicize/internalize computation.

@live1206

live1206 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Followed up on the remaining latest review concerns in 69cb3d2:

  • Moved the inline PostProcessorTests source to the existing TestData convention.
  • Fixed the custom-only SetDelimited path by having CSharpGen consider custom-code references before skipping PipelineRequestHeadersExtensions.cs, and added coverage for that case.
  • The sample-symbol concern should be covered by the provider-map path because remove candidates are derived only from generated provider graph nodes; sample classes are not provider nodes and therefore are not selected for removal by GetSymbolsByName. The old AddSampleSymbols fallback is still needed only for the Roslyn all-symbol traversal.
  • I have not replaced the process-wide static handoff in this PR; that is a broader plumbing change between CSharpGen, GeneratedCodeWorkspace, and PostProcessor. The current path still resets before generation and validates the ProjectId before consuming the result.

@live1206

live1206 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up pushed in 762a02b for the provider/file-name concern:

  • ProviderReferenceMapAnalyzer now uses TypeProvider.IsClientProvider instead of RelativeFilePath.EndsWith("Client.cs") for client roots/source-reference scanning.
  • Serialization providers are identified from the SerializationProviders relationship instead of .Serialization.cs filename suffixes.
  • Model factories now rely on provider is ModelFactoryProvider only; removed the ModelFactory.cs filename fallback.
  • The ClientUriBuilder.cs special case is now a generic TypeProvider.IncludeGeneratedBodyReferences hook overridden by ClientUriBuilderDefinition.

I also scanned the analyzer afterward; it no longer has RelativeFilePath/.cs suffix checks for these provider categories.

@live1206

live1206 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up pushed in dc5b534 for helper dependencies:

  • Replaced HelperDependencyNames/BuildHelperDependencyNames string lists with HelperDependencyTypes/BuildHelperDependencyTypes returning CSharpType.
  • Converted Client/RestClient/MRW/multipart helper dependency overrides to return provider types instead of string names.
  • Updated ProviderReferenceMapAnalyzer to add helper roots through AddTypeReference rather than simple-name matching for those dependencies.

I also scanned the touched files afterward; the old helper dependency string API and hardcoded helper dependency string returns are gone.

@JoshLove-msft JoshLove-msft left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (commits cc036f3f762a02b2)

Thanks for the fast turnaround — this round addresses most of my previous feedback well. I re-reviewed the four new commits and built the generator locally (Microsoft.TypeSpec.Generator.ClientModel, 0 warnings / 0 errors, StyleCop clean).

✅ Resolved

  • Layering (was 🔴): The biggest improvement. Emitter-specific string checks are largely gone:
    • _filesToKeep no longer hardcodes helper file names; GetFilesToKeep() derives them generically from AddTypeToKeep (AdditionalRootTypes + NonRootTypes). And since TypeFormatters/SerializationFormat/RawRequestUriBuilderExtensions are actual TypeProviders, the provider graph already governs them — so the hardcoding was correctly unnecessary.
    • New virtual markers IsClientProvider / IncludeGeneratedBodyReferences (overridden in ClientProvider / ClientUriBuilderDefinition) replace the brittle RelativeFilePath.EndsWith("Client.cs"), "/Internal/ClientUriBuilder.cs", PackageName.StartsWith("Azure."), and "ModelFactory.cs" / ".Serialization.cs" path checks. This is exactly the polymorphic hook I was hoping for.
  • Duplication (was 🔴): Analyze and GetPreWriteAccessibilityCandidates now share GetAccessibilityCandidates(...). The ~60-line copy is gone. 👍
  • Custom-code SetDelimited gap (was 🟡): CSharpGen now also consults HasCustomRequestHeaderExtensionsReference(), and there's a new test. Good.

🟡 Remaining (minor)

  1. RequestHeaderExtensions / SetDelimited is the last name-based special-case in the MTG core (CSharpGen.IsRequestHeadersExtensionsFile, analyzer IsRequestHeadersExtensionsRoot / HasCustomRequestHeaderExtensionsReference). Understandable, since this helper is emitted as a raw static file rather than a TypeProvider, so there's no provider to mark. Not blocking, but worth a // TODO noting it's the one helper the provider graph can't see, so future readers know why it's special.
  2. Static mutable state (_latestResult, _preWriteModelFactory, PreWriteAccessibilityApplied) is unchanged. Fine today (tests aren't [Parallelizable], and the ProjectId guard/Reset mitigate), but it's still hidden global coupling that would break under test parallelism. Low priority.

🔵 Questions / verify

  1. Samples on the provider remove path: AddSampleSymbols still only runs on the Roslyn fallback branch; the provider branch derives referencedSet = DeclaredSymbols − RemoveCandidates. Since sample files are generated (not custom) and aren't provider nodes, types referenced only by samples wouldn't be rooted. Your full Azure data-plane regen (no diffs) strongly suggests this is fine in practice — just confirming it's intentional and not silently relying on samples never being provider candidates.
  2. GetFileNameForType assumes a keep-type's file is exactly <SimpleName>.cs. Holds for the current helpers, but it's an implicit filename↔type coupling that would silently miss a keep-type emitted to a differently-named file. A one-line comment noting the assumption would help.
  3. ShouldUseGeneratedSourceReferences broadened from Azure-only + Client.cs to any IsClientProvider with no BodyDependencyTypes. This looks like a deliberate (good) generalization to unbranded clients — flagging only so it's a conscious behavior change covered by the regen runs.

Nice work — the polymorphic marker approach materially improves the design. From my side these are all non-blocking; happy to see it merge once (3) is confirmed.

--generated by Copilot

@JoshLove-msft

Copy link
Copy Markdown
Contributor

Correction to my follow-up review (point #1). @live1206 rightly pointed out that RequestHeaderExtensions is a TypeProviderPipelineRequestHeadersExtensionsDefinition (ScmOutputLibrary.cs:90) — not a raw static file. My "there's no provider to mark" reasoning was wrong; please disregard it.

The real reason the textual .SetDelimited( scan / IsRequestHeadersExtensionsFile special-case is needed is different: the only reference to this provider is the request.Headers.SetDelimited(...) call emitted inside RestClientProvider method bodies, and the provider reference graph builds edges from provider metadata (signatures, properties, fields, attributes, declared helper/body deps) — not from arbitrary method-body expressions. RestClientProvider doesn't declare PipelineRequestHeadersExtensions via BuildHelperDependencyNames / BuildBodyDependencyTypes / IncludeGeneratedBodyReferences, so the graph can't see the body-level usage and the file falls back to the string scan.

So the same marker mechanism you introduced for ClientUriBuilder (IncludeGeneratedBodyReferences) could likely subsume this last special-case: have RestClientProvider declare PipelineRequestHeadersExtensions as a helper/body dependency — ideally conditionally, only when a create-request method actually emits SetDelimited — so the graph roots it precisely and the .SetDelimited( scan + HasCustomRequestHeaderExtensionsReference handling can be dropped from the MTG core. Not blocking, just noting it's the same class of body-reference gap the rest of the PR already handles via markers, so it'd be nice to make it consistent rather than a textual exception.

--generated by Copilot

@live1206

live1206 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Addressed in 0a170d3. RestClientProvider now declares PipelineRequestHeadersExtensionsDefinition as a typed helper dependency only for generated collection header parameters that emit request.Headers.SetDelimited(...) (excluding the prefixed dictionary path that uses Add). With that provider edge in place, I removed the MTG-core textual .SetDelimited( scan, the request-header extension file-name filter, and the custom text-scan helper in ProviderReferenceMapAnalyzer. The existing custom-only coverage still passes through the reference analysis path.

@live1206

live1206 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up on the remaining samples question from the latest review: the provider remove path only computes removal candidates from generated provider graph nodes, not from arbitrary project symbols. Sample files are generated documents but they are not provider nodes, so sample classes themselves are not selected as remove candidates by the provider graph. For generated types referenced only by samples, the full DPG regen/no-diff validation is the practical coverage; I do not see an additional code change needed here. The remaining static-state concern is still a known low-priority follow-up rather than a blocker for this PR.

@JoshLove-msft

Copy link
Copy Markdown
Contributor

Design suggestion: derive body references from expression trees rather than declared deps + source re-parsing

Following up on the body-reference discussion — the current approach builds graph edges from provider metadata (signatures, properties, fields, attributes, etc.) and then compensates for body-level references three ways: manually-declared HelperDependencyNames / BodyDependencyTypes / IncludeGeneratedBodyReferences, AddGeneratedBodyReferences re-parsing generated source with Roslyn for a curated set of "candidate" providers, and hardcoded fallbacks (the SetDelimited scan, ChangeTracking helper roots, etc.).

The concern is soundness: a pure-metadata graph is an under-approximation of reachability, because a C# type reference can appear anywhere a method body can (new Helper(), SomeStatic.Call(), extension calls like request.Headers.SetDelimited(...), casts, typeof). For a removal/internalization pass, under-approximating is the dangerous direction — you conclude "nothing references X," internalize/remove it, and the break surfaces later as a downstream compile error or a silent public-API regression. Correctness then rests on provider authors remembering to declare every body dependency (a manually-maintained parallel list that will drift) plus the hand-curated candidate list.

Suggestion: walk the body expression trees instead. The bodies already exist as structured MethodBodyStatement / ValueExpression trees before they're rendered to source. Traversing those to collect CSharpType references (constructor targets, static member owners, type literals, generic args, cast targets) would be:

  • sound by construction for anything the generator itself emits — it closes the gap generically instead of per-provider declaration, and would subsume the SetDelimited special-case and most BodyDependencyTypes/HelperDependencyNames overrides;
  • likely cheaper than AddGeneratedBodyReferences re-parsing generated C# through a Roslyn semantic model.

The fact that the PR re-parses generated source for the client/serialization candidates (rather than walking expression trees) suggests the tree walk wasn't feasible for all bodies — presumably where bodies are raw TypeProvider source or interpolated string literals whose contents aren't structured ValueExpressions. If so, that's the honest residual limitation to document: expression-tree walking would handle the structured majority soundly, with a narrow, explicit fallback (or a CI superset-check against the existing full Roslyn map) for the unstructured remainder — instead of today's broader reliance on manual declarations + name-based special-cases.

Not a merge blocker, but I think it's the change that would turn this from "correct-by-testing" into "correct-by-construction." Would also be worth attributing how much of the ~21% comes from skipping body analysis vs. from pre-write internalization (skipping the Roslyn InternalizeAsync/reduce passes) — if it's mostly the latter, a sound full-body map may retain most of the win.

--generated by Copilot

@live1206 live1206 requested a review from bterlson as a code owner July 1, 2026 06:18
Comment thread .chronus/changes/mtg-hybrid-reference-map-2026-07-01-02-43-39.md Outdated
Comment thread .chronus/config.yaml Outdated

foreach (var outputType in output.TypeProviders)
{
if (outputType is ModelFactoryProvider && outputType.Methods.Count == 0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this only needed for model factories ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept this one intentionally: model factory methods can be pruned before writing, and if all methods are removed we should avoid writing an empty factory. Other providers are skipped through ShouldWriteProvider rather than method pruning.

Comment on lines +101 to +122
var generatedPublicReachable = GetReachableTypes(internalizeRoots, internalizeReferences);
AddDerivedModelReferences(providers, publicGraph.Nodes, internalizeReferences, generatedPublicReachable, generatedDiscriminatorBaseNames);
internalizeRoots.UnionWith(customPublicRoots);
var internalizeReachableWithoutHelpers = GetReachableTypes(internalizeRoots, internalizeReferences);
AddDerivedModelReferences(providers, publicGraph.Nodes, internalizeReferences, internalizeReachableWithoutHelpers, generatedDiscriminatorBaseNames);
internalizeReachableWithoutHelpers = GetReachableTypes(internalizeRoots, internalizeReferences);
var publicizeRoots = internalizeRoots.ToHashSet(StringComparer.Ordinal);
var internalizeHelperRoots = GetHelperRootNames(generatedProviders, graph.Nodes, internalizeReachableWithoutHelpers);
internalizeRoots.UnionWith(internalizeHelperRoots);
var internalizeReachable = GetReachableTypes(internalizeRoots, internalizeReferences);
var internalizeDeclaredNodes = GetPostProcessorDeclaredNodes(generatedProviders, graph.Nodes, publicOnly: true);
var customInternalBoundaryNodes = graph.Nodes
.Where(name => publicGraph.References.TryGetValue(name, out var references) && references.Overlaps(customInternalDeclarations))
.ToHashSet(StringComparer.Ordinal);
var publicizeDeclaredNodes = GetPostProcessorDeclaredNodes(generatedProviders, graph.Nodes, publicOnly: false)
.Except(internalizeDeclaredNodes, StringComparer.Ordinal);
var generatedImplementationInternalDeclarations = GetGeneratedImplementationInternalTypeDeclarations(generatedInternalDeclarations).ToHashSet(StringComparer.Ordinal);
var publicApiTraversalNodes = internalizeDeclaredNodes
.Except(generatedInternalDeclarations, StringComparer.Ordinal)
.Concat(publicizeDeclaredNodes)
.Except(generatedImplementationInternalDeclarations, StringComparer.Ordinal)
.ToHashSet(StringComparer.Ordinal);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be honest and say this is quite difficult to follow ☹️

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this needs to be made more readable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored Analyze in f5d83b7 so it now delegates to named accessibility/removal candidate builders instead of keeping both flows inline.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this in f5d83b7 by splitting the dense Analyze flow into named accessibility and removal candidate phases.

@live1206 live1206 Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is still massive, I will refactor it into smaller structured pieces with less responsibility each.

@live1206 live1206 force-pushed the mtg-hybrid-reference-map branch from 9951f4b to 22adf62 Compare July 2, 2026 03:08
@JoshLove-msft

Copy link
Copy Markdown
Contributor

Re-reviewed the latest rewrite (22adf62fb) — nice improvements: CSharpGen is fully clean now, and modeling the helper/body/signature dependencies through providers (with NamedTypeSymbolProvider walking custom syntax) is the direction I was hoping for. Two follow-ups:

1. RequestHeaderExtensions is still a by-name special-case

It's more structured than the old text scan, but it's still matched by string name across several places — HasCustomRequestHeaderExtensionsReference, IsRequestHeadersExtensionsRoot, HasRequestHeaderExtensionsDependency (+ method/property/field variants), and CreateUnresolvedDependencyType("SetDelimited"). It's the one helper that isn't a real type edge in the graph, because SetDelimited is an extension-method call that doesn't resolve to a concrete generated CSharpType via provider metadata.

Not blocking, but since it's the residual of the body-reference-soundness gap (the graph is built from metadata + a candidate-gated source scan, so a body-level extension call has no natural edge), a short // TODO at HasCustomRequestHeaderExtensionsReference / CreateUnresolvedDependencyType explaining why this one helper is name-matched would save the next reader a lot of time. Longer term it'd be nice to resolve SetDelimited to its declaring PipelineRequestHeadersExtensions type so it becomes an ordinary graph edge like the other helpers.

2. Likely redundant block in AddCustomCodeViewRoots

private static void AddCustomCodeViewRoots(HashSet<string> roots, TypeProvider customCodeView, HashSet<string> generatedTypeNames, bool publicOnly)
{
    if (customCodeView is NamedTypeSymbolProvider)
    {
        AddProviderBodyDependencyTypes(roots, customCodeView.SignatureDependencyTypes, generatedTypeNames, includeSimpleNameReferences: true);
    }

    AddTypeReference(roots, customCodeView.BaseType, generatedTypeNames);
    AddProviderBodyDependencyTypes(roots, customCodeView.SignatureDependencyTypes, generatedTypeNames, includeSimpleNameReferences: true);
    ...

The if (customCodeView is NamedTypeSymbolProvider) branch calls AddProviderBodyDependencyTypes(..., SignatureDependencyTypes, ...) with exactly the same arguments as the unconditional call two lines below it, so for a NamedTypeSymbolProvider these deps are added twice (harmless into a HashSet, but dead/confusing). Unless the intent was for the conditional call to use a different source (e.g. BodyDependencyTypes), the if block looks like a refactoring leftover and can just be removed.

--generated by Copilot

@live1206 live1206 force-pushed the mtg-hybrid-reference-map branch from 22adf62 to ac2c0c1 Compare July 2, 2026 05:37
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@live1206 live1206 force-pushed the mtg-hybrid-reference-map branch from ac2c0c1 to d3e60ef Compare July 2, 2026 05:39
live1206 and others added 6 commits July 2, 2026 06:26
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

emitter:client:csharp Issue for the C# client emitter: @typespec/http-client-csharp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants