Skip to content

Latest commit

 

History

History
172 lines (127 loc) · 8.62 KB

File metadata and controls

172 lines (127 loc) · 8.62 KB

ArXiv Deep Research Map for Agent Cortex

Purpose: deepen this repo with a practical, operator-first arXiv map across all major categories.

How to use this map:

  • Start with the "Top 3 must-read" papers in each category.
  • Use "Expansion" to go deeper once you have a baseline.
  • Re-check benchmark claims against current leaderboards before publishing hard numbers.

1) Agent Frameworks and Reasoning Loops

Top 3 must-read (and why):

Expansion:

2) Coding Agents

Top 3 must-read (and why):

Expansion:

3) MCP, Tool Use, and Agent Reliability

Top 3 must-read (and why):

Expansion:

4) Web and Computer-Use Agents

Top 3 must-read (and why):

Expansion:

5) Context Engineering and Memory

Top 3 must-read (and why):

  • RAG (2020) - Retrieval architecture baseline for external memory.
  • MemGPT (2023) - Memory tiering and virtual-context perspective.
  • Self-RAG (2023) - Retrieval with self-critique control loop.

Expansion:

6) Prompt and Programmatic Prompt Engineering

Top 3 must-read (and why):

Expansion:

7) Security and Robustness

Top 3 must-read (and why):

Expansion:

8) Voice and Multimodal Agents

Top 3 must-read (and why):

9) Evaluation Science and LLM-as-a-Judge

Top 3 must-read (and why):

10) Quant and Trading Agents

Top 3 must-read (and why):

Expansion:

11) Blockchain Identity, Payments, and DeFi-Adjacent Research

Top 3 must-read (and why):

Expansion:


Recent ArXiv Watchlist (last ~90 days, as of 2026-03-12)

Note: this watchlist is intentionally short and high-signal; verify final inclusion quality when papers leave preprint churn.

Monthly Refresh Workflow (template)

Use this on the first week of each month:

  1. Pull candidates
  • Query arXiv for each category using category keywords and date filter for last 45 days.
  • Keep a raw scratch list (20-40 papers total).
  1. Apply inclusion gate
  • Keep papers that pass at least 3 of 5:
    • Clear method/benchmark contribution
    • Reproducibility artifacts (code/data/eval details)
    • Strong operator relevance (how to build, test, secure, deploy)
    • Non-trivial novelty versus existing map
    • Cross-category leverage (useful outside one niche)
  1. Curate final set
  • Promote 1-3 papers per category max per month.
  • Move low-signal or superseded papers to an archive list.
  1. Update repo docs
  • Update this map.
  • Update README only if a paper changes the practical narrative (for example, reliability ceilings, new benchmark standard).
  1. Log rationale
  • For each promoted paper, add one-line "why it matters" in commit message or PR body.

Suggested commit format:

  • docs(research): monthly arxiv refresh YYYY-MM