<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>LLM Agents on Jamie&#39;s Blog</title>
    <link>http://akjamie.github.io/tags/llm-agents/</link>
    <description>Recent content in LLM Agents on Jamie&#39;s Blog</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Sun, 24 May 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="http://akjamie.github.io/tags/llm-agents/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Designing Context Compression for Production Agents: A Deep Dive into Hermes</title>
      <link>http://akjamie.github.io/post/2026-05-24-context-compressor-deep-dive/</link>
      <pubDate>Sun, 24 May 2026 00:00:00 +0000</pubDate>
      <guid>http://akjamie.github.io/post/2026-05-24-context-compressor-deep-dive/</guid>
      <description>&lt;h1 id=&#34;designing-context-compression-for-production-agents-a-deep-dive-into-hermes&#34;&gt;Designing Context Compression for Production Agents: A Deep Dive into Hermes&lt;/h1&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;Staff-engineer-level notes on &lt;code&gt;agent/context_compressor.py&lt;/code&gt;: how Hermes&#xA;preserves task continuity when a long-running agent outgrows the model context&#xA;window, and what the implementation teaches about summarization, compression,&#xA;and failure-tolerant agent design.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;[!NOTE]&lt;/p&gt;&#xA;&lt;h3 id=&#34;executive-tldr&#34;&gt;Executive TL;DR&lt;/h3&gt;&#xA;&lt;p&gt;Hermes context compression is not &amp;ldquo;summarize the chat when it gets long.&amp;rdquo; It is&#xA;a transcript rewrite algorithm with strict invariants:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Head / middle / tail partitioning:&lt;/strong&gt; keep the system prompt and first turns&#xA;intact, summarize the middle, and protect the recent tail by token budget.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Active task anchoring:&lt;/strong&gt; the latest user message must stay outside the&#xA;summary. A summarized &amp;ldquo;pending ask&amp;rdquo; is reference material, not a live user&#xA;turn.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Tool-aware compaction:&lt;/strong&gt; old tool outputs are deduplicated, summarized, and&#xA;pruned before any LLM call; tool call/result pairs are sanitized afterward so&#xA;providers never receive invalid message history.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Iterative summaries:&lt;/strong&gt; second and later compactions update the existing&#xA;handoff instead of recursively summarizing summaries as ordinary turns.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Multimodal budgeting:&lt;/strong&gt; images are charged a fixed token estimate so image&#xA;sessions do not accidentally preserve far more context than the model can fit.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Failure visibility:&lt;/strong&gt; if the summary model fails, Hermes inserts an explicit&#xA;fallback marker and records dropped-turn metadata instead of silently losing&#xA;context.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;how-to-use-this-deep-dive&#34;&gt;How to Use This Deep Dive&lt;/h2&gt;&#xA;&lt;p&gt;Read this document in four passes:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Hermes Agent — Deep Dive Learning Notes</title>
      <link>http://akjamie.github.io/post/2026-05-21-hermes-agent-deep-dive/</link>
      <pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate>
      <guid>http://akjamie.github.io/post/2026-05-21-hermes-agent-deep-dive/</guid>
      <description>&lt;h1 id=&#34;hermes-agent--deep-dive-learning-notes&#34;&gt;Hermes Agent — Deep Dive Learning Notes&lt;/h1&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;Staff-engineer-level notes for senior AI engineers designing and implementing production agents.&#xA;Written after reading &lt;code&gt;run_agent.py&lt;/code&gt;, &lt;code&gt;model_tools.py&lt;/code&gt;, &lt;code&gt;toolsets.py&lt;/code&gt;, &lt;code&gt;agent/&lt;/code&gt;, and &lt;code&gt;tools/&lt;/code&gt; in full.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;1-high-level-architecture&#34;&gt;1. High-Level Architecture&lt;/h2&gt;&#xA;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────────────┐&#xD;&#xA;│                         Entry Points                                │&#xD;&#xA;│  cli.py (HermesCLI)  │  gateway/run.py  │  batch_runner.py         │&#xD;&#xA;│  tui_gateway/server  │  acp_adapter/    │  run_agent.py __main__    │&#xD;&#xA;└──────────────────────┬──────────────────────────────────────────────┘&#xD;&#xA;                       │&#xD;&#xA;                       ▼&#xD;&#xA;┌─────────────────────────────────────────────────────────────────────┐&#xD;&#xA;│                      AIAgent  (run_agent.py)                        │&#xD;&#xA;│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐  │&#xD;&#xA;│  │ Conversation │  │  Tool Loop   │  │  Provider / Transport    │  │&#xD;&#xA;│  │   History    │  │  Orchestrator│  │  (Anthropic / OpenAI /   │  │&#xD;&#xA;│  │  (messages)  │  │              │  │   Bedrock / Codex / ACP) │  │&#xD;&#xA;│  └──────────────┘  └──────────────┘  └──────────────────────────┘  │&#xD;&#xA;│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐  │&#xD;&#xA;│  │  ContextComp │  │  MemoryMgr   │  │  CredentialPool          │  │&#xD;&#xA;│  │  -ressor     │  │  (builtin +  │  │  (multi-key failover)    │  │&#xD;&#xA;│  │              │  │   plugins)   │  │                          │  │&#xD;&#xA;│  └──────────────┘  └──────────────┘  └──────────────────────────┘  │&#xD;&#xA;└──────────────────────┬──────────────────────────────────────────────┘&#xD;&#xA;                       │&#xD;&#xA;                       ▼&#xD;&#xA;┌─────────────────────────────────────────────────────────────────────┐&#xD;&#xA;│                    model_tools.py                                   │&#xD;&#xA;│  get_tool_definitions()  │  handle_function_call()                  │&#xD;&#xA;│  _run_async()            │  _should_parallelize_tool_batch()        │&#xD;&#xA;└──────────────────────┬──────────────────────────────────────────────┘&#xD;&#xA;                       │&#xD;&#xA;                       ▼&#xD;&#xA;┌─────────────────────────────────────────────────────────────────────┐&#xD;&#xA;│                    tools/registry.py  (singleton)                   │&#xD;&#xA;│  ToolRegistry.register()  │  .dispatch()  │  .get_definitions()     │&#xD;&#xA;└──────────────────────┬──────────────────────────────────────────────┘&#xD;&#xA;                       │&#xD;&#xA;          ┌────────────┴────────────┐&#xD;&#xA;          ▼                         ▼&#xD;&#xA;┌──────────────────┐     ┌──────────────────────────────────────────┐&#xD;&#xA;│  tools/*.py      │     │  plugins/&amp;lt;name&amp;gt;/__init__.py              │&#xD;&#xA;│  (built-in tools)│     │  (user / pip-installed plugins)          │&#xD;&#xA;└──────────────────┘     └──────────────────────────────────────────┘&#xA;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; The architecture is a strict layered DAG. &lt;code&gt;tools/registry.py&lt;/code&gt; has zero imports from any other Hermes module — it is the root. Every tool file imports from it. &lt;code&gt;model_tools.py&lt;/code&gt; imports from the registry and triggers discovery. &lt;code&gt;run_agent.py&lt;/code&gt; imports from &lt;code&gt;model_tools.py&lt;/code&gt;. This prevents circular imports and makes the tool system independently testable.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Inside Claude Code: The Architecture of a Production-Grade System Prompt</title>
      <link>http://akjamie.github.io/post/2026-05-16-mastering-agent-memory-prompt-design/</link>
      <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
      <guid>http://akjamie.github.io/post/2026-05-16-mastering-agent-memory-prompt-design/</guid>
      <description>&lt;h1 id=&#34;inside-claude-code-the-architecture-of-a-production-grade-system-prompt&#34;&gt;Inside Claude Code: The Architecture of a Production-Grade System Prompt&lt;/h1&gt;&#xA;&lt;p&gt;When we think of &amp;ldquo;prompt engineering,&amp;rdquo; we often imagine a single, monolithic block of text meticulously tweaked through trial and error. But for production-grade agentic systems like Claude Code, the system prompt is less of a static document and more of a dynamic, highly optimized operating system.&lt;/p&gt;&#xA;&lt;p&gt;By examining the &lt;code&gt;src/constants/prompts.ts&lt;/code&gt; and &lt;code&gt;src/constants/systemPromptSections.ts&lt;/code&gt; files of the Claude Code repository, we can extract concrete patterns in modular prompt design, behavioral alignment, and token efficiency that apply to any agentic system.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
