Skip to content

Warn when configured LLM is not frontier-recommended#586

Open
devin-ai-integration[bot] wants to merge 3 commits into
mainfrom
devin/1782479089-llm-model-warning
Open

Warn when configured LLM is not frontier-recommended#586
devin-ai-integration[bot] wants to merge 3 commits into
mainfrom
devin/1782479089-llm-model-warning

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Jun 26, 2026

Copy link
Copy Markdown

Summary

Adds a non-blocking startup MODEL QUALITY WARNING in warm_up_llm() when STRIX_LLM is set to a model outside Strix's recommended/frontier set:

is_recommended_or_frontier_model("openai/gpt-4.1")  # False -> warn
is_recommended_or_frontier_model("anthropic/claude-sonnet-4-6")  # True
is_recommended_or_frontier_model("custom-ollama/gpt-5-mini-local")  # False -> warn

The warning runs after settings are loaded and before the LLM warmup call; it prints guidance and recommended model names but does not block the scan.

Expands the frontier coverage from a small OpenAI/Anthropic/Gemini set to current SOTA families across OpenAI GPT-5.x, Anthropic Claude Opus/Sonnet 4.x, Gemini 3.x, Grok 4.x, DeepSeek V4/R1, Qwen3.x, Moonshot Kimi K2.x, and Mistral/Magistral. The matcher now groups provider markers with model prefixes so routed names like litellm/openai/gpt-5.4-pro, bedrock_mantle/openai.gpt-5.5, vertex_ai/claude-sonnet-4-6@default, and openrouter/google/gemini-3.1-pro-preview are accepted without allowing arbitrary custom providers to bypass the warning.

Also updates the pre-commit mypy hook to a current mypy version and fixes existing hook-only lint/typecheck issues so local hooks pass.

Link to Devin session: https://app.devin.ai/sessions/0c9d3d7e49ce4f6d84113bcce25c4456
Requested by: @0xallam

@0xallam 0xallam self-assigned this Jun 26, 2026
@devin-ai-integration

Copy link
Copy Markdown
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a startup MODEL QUALITY WARNING panel that fires when STRIX_LLM is set to a model that is neither explicitly recommended nor part of a recognized frontier family. It also fixes a mutable class-level list default in StrixDockerSandboxClient, tightens mypy type: ignore suppressions, and bumps the pre-commit mypy hook to v1.19.1.

  • strix/config/models.py: Introduces RECOMMENDED_MODEL_NAMES, FRONTIER_MODEL_PREFIXES, and is_recommended_or_frontier_model with two private helpers that normalize provider prefixes (litellm/, any-llm/) and extract the bare model name for prefix matching.
  • strix/interface/main.py: Wires the new check into warm_up_llm, printing a yellow warning panel when the model isn\u2019t recognized; the scan proceeds rather than aborting.
  • strix/runtime/docker_client.py: Replaces the shared mutable [] class default with None, eliminating the cross-instance mutation footgun; the consuming code already handles None via a truthiness guard.

Confidence Score: 4/5

Safe to merge; the warning is non-blocking and the normalization logic is verified by the new test suite.

The frontier-detection logic is correct for all documented inputs and the tests cover the important cases. The two observations are about implicit coupling between the lowercased candidate strings and the case of RECOMMENDED_MODEL_NAMES constants, and about the bare-name extraction allowing frontier-prefix matches from any provider. Neither breaks current behavior, but both could cause silent failures or missed warnings as the constant lists grow.

strix/config/models.py — the _is_recommended_or_frontier_candidate function has an implicit invariant that RECOMMENDED_MODEL_NAMES entries are always lowercase, and the bare-name prefix matching has no provider guard.

Important Files Changed

Filename Overview
strix/config/models.py Adds RECOMMENDED_MODEL_NAMES / FRONTIER_MODEL_PREFIXES constants and three new helper functions for frontier-model detection. Logic is correct for all documented cases; minor implicit invariant on constant casing noted.
strix/interface/main.py Adds a MODEL QUALITY WARNING panel in warm_up_llm when the configured model is not recommended/frontier. Guard condition is correct (respects empty model, runs after the existing bare-name exit check).
strix/runtime/docker_client.py Fixes mutable class-level list default ([] → None). Safe because backends.py assigns the field directly (bind_mounts or []) and the consuming code uses a truthiness guard, so None and [] are both handled correctly.
tests/test_models.py New test file with parameterized coverage of recommended, frontier-prefixed, and non-frontier models. Covers litellm/any-llm prefixes and empty-string edge case.
strix/interface/tui/app.py Narrows three type: ignore[misc] suppressions to the more specific type: ignore[untyped-decorator] to satisfy the updated mypy version.
strix/core/hooks.py Reformats a long if-condition to comply with the line-length linter; no functional change.
.pre-commit-config.yaml Bumps mypy pre-commit mirror from v1.16.0 to v1.19.1 and adds the types-Pygments stub dependency.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
strix/config/models.py:191-194
**Implicit case-sensitivity invariant on `RECOMMENDED_MODEL_NAMES`**

`_is_recommended_or_frontier_candidate` receives already-lowercased strings from `_normalized_model_candidates`, but the `in RECOMMENDED_MODEL_NAMES` lookup relies on `RECOMMENDED_MODEL_NAMES` values also being lowercase — this is an implicit coupling. All three current entries happen to be lowercase, so it works today, but adding a mixed-case entry (e.g. `"Vertex_AI/Gemini-3-Pro-Preview"`) would silently break the exact-match path while the `startswith` frontier-prefix path would still pass. A `.lower()` call on each constant in the set (or converting `RECOMMENDED_MODEL_NAMES` to a `frozenset` of lowercased strings) would make this invariant explicit and safe.

### Issue 2 of 2
strix/config/models.py:68-73
**Frontier prefix matching can produce false negatives via bare-name extraction**

`_normalized_model_candidates` generates a second candidate by splitting on the last `/` and taking the right-hand side. Any model whose bare name starts with a frontier prefix will be treated as approved — regardless of provider. For instance, a self-hosted or proxy model named `"custom-ollama/gpt-5-mini-local"` yields bare candidate `"gpt-5-mini-local"`, which starts with `"gpt-5"` and silently suppresses the quality warning. This may be an acceptable trade-off, but worth documenting explicitly (and possibly adding a provider check alongside the prefix check).

Reviews (1): Last reviewed commit: "Warn for non-frontier LLM selections" | Re-trigger Greptile

Comment thread strix/config/models.py Outdated
Comment on lines +191 to +194
def _is_recommended_or_frontier_candidate(model_name: str) -> bool:
if model_name in RECOMMENDED_MODEL_NAMES:
return True
return any(model_name.startswith(prefix) for prefix in FRONTIER_MODEL_PREFIXES)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Implicit case-sensitivity invariant on RECOMMENDED_MODEL_NAMES

_is_recommended_or_frontier_candidate receives already-lowercased strings from _normalized_model_candidates, but the in RECOMMENDED_MODEL_NAMES lookup relies on RECOMMENDED_MODEL_NAMES values also being lowercase — this is an implicit coupling. All three current entries happen to be lowercase, so it works today, but adding a mixed-case entry (e.g. "Vertex_AI/Gemini-3-Pro-Preview") would silently break the exact-match path while the startswith frontier-prefix path would still pass. A .lower() call on each constant in the set (or converting RECOMMENDED_MODEL_NAMES to a frozenset of lowercased strings) would make this invariant explicit and safe.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/config/models.py
Line: 191-194

Comment:
**Implicit case-sensitivity invariant on `RECOMMENDED_MODEL_NAMES`**

`_is_recommended_or_frontier_candidate` receives already-lowercased strings from `_normalized_model_candidates`, but the `in RECOMMENDED_MODEL_NAMES` lookup relies on `RECOMMENDED_MODEL_NAMES` values also being lowercase — this is an implicit coupling. All three current entries happen to be lowercase, so it works today, but adding a mixed-case entry (e.g. `"Vertex_AI/Gemini-3-Pro-Preview"`) would silently break the exact-match path while the `startswith` frontier-prefix path would still pass. A `.lower()` call on each constant in the set (or converting `RECOMMENDED_MODEL_NAMES` to a `frozenset` of lowercased strings) would make this invariant explicit and safe.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8ca83e4 by normalizing recommended-model constants into _RECOMMENDED_MODEL_NAME_SET for case-insensitive exact matches.

Comment thread strix/config/models.py Outdated
Comment on lines +68 to +73
FRONTIER_MODEL_PREFIXES = (
"gpt-5",
"claude-opus-4",
"claude-sonnet-4",
"gemini-3",
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Frontier prefix matching can produce false negatives via bare-name extraction

_normalized_model_candidates generates a second candidate by splitting on the last / and taking the right-hand side. Any model whose bare name starts with a frontier prefix will be treated as approved — regardless of provider. For instance, a self-hosted or proxy model named "custom-ollama/gpt-5-mini-local" yields bare candidate "gpt-5-mini-local", which starts with "gpt-5" and silently suppresses the quality warning. This may be an acceptable trade-off, but worth documenting explicitly (and possibly adding a provider check alongside the prefix check).

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/config/models.py
Line: 68-73

Comment:
**Frontier prefix matching can produce false negatives via bare-name extraction**

`_normalized_model_candidates` generates a second candidate by splitting on the last `/` and taking the right-hand side. Any model whose bare name starts with a frontier prefix will be treated as approved — regardless of provider. For instance, a self-hosted or proxy model named `"custom-ollama/gpt-5-mini-local"` yields bare candidate `"gpt-5-mini-local"`, which starts with `"gpt-5"` and silently suppresses the quality warning. This may be an acceptable trade-off, but worth documenting explicitly (and possibly adding a provider check alongside the prefix check).

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8ca83e4 by only applying frontier bare-name prefix matching when the provider is known frontier (openai, anthropic, vertex_ai, gemini, google). Added a regression test for custom-ollama/gpt-5-mini-local.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant