LLM Agents on Jamie's Blog

Designing Context Compression for Production Agents: A Deep Dive into Hermes

Sun, 24 May 2026 00:00:00 +0000

Designing Context Compression for Production Agents: A Deep Dive into Hermes

Staff-engineer-level notes on agent/context_compressor.py: how Hermes preserves task continuity when a long-running agent outgrows the model context window, and what the implementation teaches about summarization, compression, and failure-tolerant agent design.

[!NOTE]

Executive TL;DR

Hermes context compression is not “summarize the chat when it gets long.” It is a transcript rewrite algorithm with strict invariants:

Head / middle / tail partitioning: keep the system prompt and first turns intact, summarize the middle, and protect the recent tail by token budget.

Active task anchoring: the latest user message must stay outside the summary. A summarized “pending ask” is reference material, not a live user turn.

Tool-aware compaction: old tool outputs are deduplicated, summarized, and pruned before any LLM call; tool call/result pairs are sanitized afterward so providers never receive invalid message history.

Iterative summaries: second and later compactions update the existing handoff instead of recursively summarizing summaries as ordinary turns.

Multimodal budgeting: images are charged a fixed token estimate so image sessions do not accidentally preserve far more context than the model can fit.

Failure visibility: if the summary model fails, Hermes inserts an explicit fallback marker and records dropped-turn metadata instead of silently losing context.

How to Use This Deep Dive

Read this document in four passes:

Hermes Agent — Deep Dive Learning Notes

Thu, 21 May 2026 00:00:00 +0000

Hermes Agent — Deep Dive Learning Notes

Staff-engineer-level notes for senior AI engineers designing and implementing production agents. Written after reading run_agent.py, model_tools.py, toolsets.py, agent/, and tools/ in full.

1. High-Level Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         Entry Points                                │
│  cli.py (HermesCLI)  │  gateway/run.py  │  batch_runner.py         │
│  tui_gateway/server  │  acp_adapter/    │  run_agent.py __main__    │
└──────────────────────┬──────────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      AIAgent  (run_agent.py)                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐  │
│  │ Conversation │  │  Tool Loop   │  │  Provider / Transport    │  │
│  │   History    │  │  Orchestrator│  │  (Anthropic / OpenAI /   │  │
│  │  (messages)  │  │              │  │   Bedrock / Codex / ACP) │  │
│  └──────────────┘  └──────────────┘  └──────────────────────────┘  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐  │
│  │  ContextComp │  │  MemoryMgr   │  │  CredentialPool          │  │
│  │  -ressor     │  │  (builtin +  │  │  (multi-key failover)    │  │
│  │              │  │   plugins)   │  │                          │  │
│  └──────────────┘  └──────────────┘  └──────────────────────────┘  │
└──────────────────────┬──────────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    model_tools.py                                   │
│  get_tool_definitions()  │  handle_function_call()                  │
│  _run_async()            │  _should_parallelize_tool_batch()        │
└──────────────────────┬──────────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    tools/registry.py  (singleton)                   │
│  ToolRegistry.register()  │  .dispatch()  │  .get_definitions()     │
└──────────────────────┬──────────────────────────────────────────────┘
                       │
          ┌────────────┴────────────┐
          ▼                         ▼
┌──────────────────┐     ┌──────────────────────────────────────────┐
│  tools/*.py      │     │  plugins/<name>/__init__.py              │
│  (built-in tools)│     │  (user / pip-installed plugins)          │
└──────────────────┘     └──────────────────────────────────────────┘

Key insight: The architecture is a strict layered DAG. tools/registry.py has zero imports from any other Hermes module — it is the root. Every tool file imports from it. model_tools.py imports from the registry and triggers discovery. run_agent.py imports from model_tools.py. This prevents circular imports and makes the tool system independently testable.

Inside Claude Code: The Architecture of a Production-Grade System Prompt

Thu, 07 May 2026 00:00:00 +0000

Inside Claude Code: The Architecture of a Production-Grade System Prompt

When we think of “prompt engineering,” we often imagine a single, monolithic block of text meticulously tweaked through trial and error. But for production-grade agentic systems like Claude Code, the system prompt is less of a static document and more of a dynamic, highly optimized operating system.

By examining the src/constants/prompts.ts and src/constants/systemPromptSections.ts files of the Claude Code repository, we can extract concrete patterns in modular prompt design, behavioral alignment, and token efficiency that apply to any agentic system.