Skip to content

Stop parsing from mutating the Constants it reads#222

Merged
derek73 merged 5 commits into
masterfrom
fix/no-config-mutation-during-parse
Jul 5, 2026
Merged

Stop parsing from mutating the Constants it reads#222
derek73 merged 5 commits into
masterfrom
fix/no-config-mutation-during-parse

Conversation

@derek73

@derek73 derek73 commented Jul 5, 2026

Copy link
Copy Markdown
Owner

Problem

parse_pieces() and join_on_conjunctions() wrote back into the config while parsing a name: period-joined titles/suffixes ("Lt.Gov."C.titles, "JD.CPA"C.suffix_not_acronyms) and conjunction-joined pieces ("of the"C.conjunctions, "Mr. and Mrs."C.titles, "von und zu"C.prefixes). Since self.C defaults to the shared module-level CONSTANTS singleton, parsing one name permanently changed how every subsequent name in the process parsed:

  • parse results depended on input order and weren't reproducible across runs,
  • concurrent parsing raced on the shared sets,
  • every such name invalidated the carefully managed suffixes_prefixes_titles cache,
  • the test suite's conftest.py had to snapshot/restore all CONSTANTS collections around every test just to stay order-independent.

Fix

The derivations are load-bearing only within the parse that produced them (registered in parse_pieces, consumed by later is_title()/is_suffix()/… checks in the same parse_full_name() run). So they now live in per-instance _derived_titles / _derived_suffixes / _derived_conjunctions / _derived_prefixes sets that the lookup predicates (is_title, is_suffix, is_conjunction, is_prefix, is_rootname) consult alongside self.C, reset at the start of each parse and backfilled in __setstate__ so pickles from older versions keep working. self.C is now read-only during parsing.

Parse results are unchanged — each parse re-derives its own entries — verified against master for all trigger inputs, including the comma-format paths.

Tests

New ParsingDoesNotMutateConfigTests (TDD: written first, failed on the leak assertions):

  • one no-mutation test per historical leak path, each also asserting the parse result stays correct so the fix can't pass by deleting the derivation logic,
  • the snapshot is derived structurally from Constants.__getstate__(), so collections added to Constants in the future are watched automatically (no list to maintain),
  • instance-owned Constants (constants=None) isn't polluted either, and re-assigning full_name re-derives from a clean slate.

Review-driven regression tests (both mutation-verified — the guarded code was deletable with the full suite passing before these):

  • is_rootname must exclude derived titles from the rootname count ("Lt.Gov. juan e garcia" — overlay-blind counting empties the first name),
  • unpickling pre-fix HumanName state without the _derived_* attributes must not crash capitalize().

Docs

  • customize.rst: the shared-config section now states the new guarantee — parsing never modifies the configuration, so concurrent parsing against the shared instance is safe.
  • AGENTS.md: parse-flow notes updated (they described the old mutate-the-config behavior) and a new gotcha documents the invariant plus the _derived_* recipe for future derived categories.
  • Release log entry added.

Full suite: 1340 passed. ruff, mypy, sphinx doctests, and README doctest all clean.

🤖 Generated with Claude Code

parse_pieces() and join_on_conjunctions() derived new lookup entries
while parsing (period-joined titles/suffixes like "Lt.Gov.",
conjunction-joined pieces like "Mr. and Mrs." and "von und zu") and
add()ed them directly to self.C — by default the shared module-level
CONSTANTS singleton. Parsing one name therefore permanently changed how
every later name in the process parsed: results depended on input
order, were not reproducible across runs, and concurrent parsing raced
on the shared sets. It also churned the suffixes_prefixes_titles cache,
which invalidated on every such add().

The derivations are only needed within the parse that produced them, so
track them in per-instance _derived_* sets that the is_title /
is_suffix / is_conjunction / is_prefix / is_rootname predicates consult
alongside self.C, and reset them at the start of each parse_full_name()
run. Parse results are unchanged (each parse re-derives its own
entries); the config is now read-only during parsing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@codecov-commenter

codecov-commenter commented Jul 5, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.30%. Comparing base (016873c) to head (f7e679d).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #222      +/-   ##
==========================================
+ Coverage   97.25%   97.30%   +0.04%     
==========================================
  Files          13       13              
  Lines         801      815      +14     
==========================================
+ Hits          779      793      +14     
  Misses         22       22              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@derek73 derek73 self-assigned this Jul 5, 2026
@derek73 derek73 added the bug label Jul 5, 2026
- Fix stale parse_pieces docstring that still described adding derived
  parts "to the constant"
- Document the derived-set branch in the is_prefix / is_suffix /
  is_conjunction docstrings, and reword is_leading_title's now-vacuous
  "does not mutate C.titles" contrast into the narrower load-bearing
  fact (the regex match is not registered anywhere, even per-parse)
- Remove the redundant _derived_suffixes check from is_suffix_lenient:
  it falls through to is_suffix(), which already consults the overlay,
  and derived suffixes always contain an interior period so they can
  never trip the is_an_initial() veto that "lenient" exists to bypass
- Add mutation-verified regression tests for the two coverage gaps:
  is_rootname's derived-title exclusion (its overlay checks were
  deletable without any test failing, yet dropping them misparses
  "Lt.Gov. juan e garcia" into an empty first name) and the
  __setstate__ backfill for pickles predating the derived sets (whose
  absence crashes capitalize() with AttributeError)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@derek73 derek73 added this to the v1.3.0 milestone Jul 5, 2026
derek73 and others added 3 commits July 4, 2026 23:42
The shared-configuration warning in customize.rst now has a precise
boundary worth stating: after the parse-time mutation fix, instances
can only affect each other through explicit add/remove calls, so
concurrent parsing against the shared CONSTANTS is safe.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replace the hand-enumerated _WATCHED_ATTRS list with a structural
snapshot derived from Constants.__getstate__(), the canonical listing
of an instance's own config. A collection added to Constants later is
watched automatically — there is no list to remember to update (unlike
conftest's _COLLECTION_CONFIG_ATTRS, which remains the only manual
one). A sanity assert fails loud if the discovery ever stops covering
the five historically-mutated sets, and the snapshot now also covers
TupleManager collections and scalar overrides.

Update AGENTS.md: fix the two Architecture lines that still described
parse_pieces()/join_on_conjunctions() as writing into the constants,
and add a gotcha documenting the parsing-never-writes-config invariant
and the _derived_* overlay recipe for future derived categories.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The existing self-staleness reminders live inside two extension-pattern
recipes and the release checklist, so a behavior change that fits
neither pattern reaches PR time with no check on this file — exactly
how the parse-flow lines describing the old mutate-the-config behavior
survived the PR #222 review round. The nudge states why grep is the
wrong tool for it: AGENTS.md paraphrases behavior in its own words, so
stale text rarely matches the diff's phrasing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@derek73 derek73 merged commit 9514ba3 into master Jul 5, 2026
8 checks passed
@derek73 derek73 deleted the fix/no-config-mutation-during-parse branch July 5, 2026 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants