How I Turned Claude Code into an Autonomous Operating System

Memory, automation, and multi-agent coordination

Hand-drawn sketch of layered system architecture with green-highlighted intelligent components

How I built a personal AI infrastructure that learns, remembers, and works while I sleep.


I've spent the past year building an "AI operating system" around Claude Code. What started as simple customizations has turned into a memory system, automation layer, and multi-agent coordinator. Here's how it works.

The Architecture at a Glance

LayerComponents
Memorydiary → patterns → rules → CLAUDE.md
HooksPreToolUse (safety), SessionStart (context), PreCompact (auto-diary)
MCP ServersThings 3, Strava, Calendar, Agent Mail, Firefly III, + 5 more
CLI Toolsbrowser-.js, job-.js, bd-idea.js, cchist.js, + 35 more
Pluginsdocument-skills, dev-browser, hookify, ralph-wiggum, + 12 more
Backgroundbd-automate (30m), ambient-watcher (always), + 8 launchd agents

The Memory System: Claude That Learns

My memory system is the foundation—a three-tier pipeline that turns session observations into actionable rules.

How It Works

Architecture diagram

Phase 1: Capture. At the end of meaningful sessions, I generate a diary entry that records what I worked on, decisions I made, and preferences Claude observed. 131 entries so far.

Phase 2: Extract. The /reflect command analyzes diary entries and extracts patterns with confidence scores:

JSON
{
  "id": "a545dd2b-f096-4cfa-b8b2-217aa7d35c96",
  "pattern": "use conventional commit format (feat:, fix:, style:)",
  "confidence": 0.85,
  "occurrences": 12,
  "category": "git",
  "source_diaries": ["2026-01-15-session-2.md", "2026-01-18-session-1.md"],
  "created_at": "2025-12-17T14:23:00Z",
  "last_seen": "2026-01-22T09:15:00Z"
}

Phase 3: Promote. Patterns above threshold (≥0.7 confidence, ≥3 occurrences) automatically sync to my CLAUDE.md. Claude now remembers that I prefer Things 3 for tasks, Apple Calendar as my source of truth, and that I hate when AI text uses em dashes or the rule of three.

The Self-Correcting Loop

When Claude violates a rule (like when it edited a stakeholder quote during an editorial pass), the reflection picks it up:

"Pattern violation detected: Modified user quote during editorial pass. Added explicit rule: NEVER modify user/stakeholder quotes."

Next session, Claude knows not to do that.


The Agent Tools: 40+ CLI Commands

I use a toolkit of CLI commands in ~/agent-tools/ (by Mario Zechner) that Claude can invoke.

Browser Automation (Chrome DevTools Protocol)

Terminal
browser-start.js --profile    # Launch Chrome with my cookies
browser-nav.js https://url    # Navigate
browser-screenshot.js         # Capture viewport → /tmp/screenshot-*.png
browser-eval.js 'code'        # Execute JS in page context
browser-pick.js "Select..."   # Visual element picker with Cmd+click
browser-content.js https://url # Extract as markdown via Readability

How it works: All tools connect to Chrome's remote debugging port (localhost:9222). browser-start.js launches Chrome with --remote-debugging-port=9222 and optionally syncs my user profile (cookies, logins, extensions) via rsync. The other tools use Puppeteer-core to connect to this instance.

The browser automation stack: Claude Code calls agent tools, which use Puppeteer-core to communicate with Chrome via CDP
The browser automation stack: Claude Code calls agent tools, which use Puppeteer-core to communicate with Chrome via CDP

This gives Claude eyes into authenticated web apps—I can say "check my calendar" and Claude screenshots Google Calendar with my actual events visible.

Job Search Pipeline

Terminal
job-scraper.js --scroll       # Scrape LinkedIn listings (multiple fallback selectors)
job-qualify.js --show-qualified  # Score by CMF algorithm

The qualifier uses weighted scoring:

FactorWeightExample Scores
Sweet Spot40%Strategy/Ops: 100, Founding PM: 90, Chief of Staff: 80
Company Tier25%Target: 100, Stretch: 80, Known: 60, Unknown: 40
Role Level20%Director+: 100, Senior: 80, Mid: 60
Freshness15%<24h: 100, <7d: 80, <30d: 50

Output is P0-P4 priority buckets. Cache lives at ~/.claude/state/job-listings-cache.json so I can re-score without re-scraping.

Idea Capture → Autonomous Implementation

Terminal
bd-idea "Build a CLI that does X" --priority high

Ideas get captured into my issue tracker (Beads) with an automate label. Every 30 minutes, a background processor (bd-automate) picks the highest-priority item and runs headless Claude to implement it.

One example: I dropped "Build a conversation history search tool" into the queue before bed. Woke up to a working cchist.js CLI that searches across all my Claude Code sessions with fuzzy matching, date filtering, and JSON output.

Personal Infrastructure CLIs

Beyond the agent-tools, I use a set of Go/Rust CLIs that integrate with my personal systems:

ToolCreatorPurposeExample
spogoPeter SteinbergerSpotify controlspogo play "focus playlist"
gmcliPeter SteinbergerGmail managementgmcli search "from:recruiter"
gdcliPeter SteinbergerGoogle Drivegdcli upload report.pdf
gccliPeter SteinbergerGoogle Calendargccli list --today
remindctlPeter SteinbergerApple Remindersremindctl add "Call back" --due 3pm
eightctlPeter SteinbergerEight Sleep bedeightctl temp --side left --level 2
imsgPeter SteinbergeriMessage/SMSimsg send "+1234567890" "omw"
brabblePeter SteinbergerVoice wake wordHands-free Claude via "Hey Brabble"

These complement the MCP servers—sometimes a quick CLI is faster than a tool call.

Metrics & Analytics

Terminal
claude-stats --days 7           # Session analytics
outreach.js --show-streak       # Job search streak counter
impact.js                       # Weekly impact dashboard
deliverables.js                 # What I shipped

The metrics tools analyze diary entries and track patterns across sessions—how much time on coding vs. research, quality scores, file churn.


MCP Servers: 10 Integrated Systems

The Model Context Protocol lets Claude talk to external services natively.

CategoryServerKey Tools
ProductivityThings 3get_today, add_todo, update_todo
Calendarlist_events, create_event, get_freebusy
PersonalStravaactivities, streams, zones
Firefly IIIaccounts, transactions, categories
KnowledgeReadwisesearch_highlights
Roamsearch_text, create_page, add_todo
CoordinationAgent Mailsend_msg, reserve_file, fetch_inbox
SystemApplemessages, reminders, notes

Agent Mail: Multi-Agent Coordination

When I spin up multiple Claude instances for parallel work, they need to coordinate. Agent Mail provides:

  • Message threading between agents (with Git-backed storage for audit trails)
  • File reservations to prevent edit conflicts (45-minute expiry, renewable)
  • Contact permissions for agent-to-agent communication
Architecture diagram


The Automation Layer: Claude Works While I Sleep

10 LaunchAgents (macOS Scheduled Tasks)

AgentFrequencyPurposeTimeout
bd-automateEvery 30 minProcess idea queue35 min
ambient-watcherAlways onMonitor file changes1 hour
process-cleanupEvery 2hKill zombie processes60s
chat-summarizerDaily 6 AMSummarize sessions20 min
bd-daemon-healthEvery 5 minMonitor background tasks60s
screenshot-classifierEvery 4hAuto-process screenshots15 min
weekly-events-summarySunday 11:59 PMWeekly recap5 min
claudio-reportDaily 6 AMGenerate reports60s
process-cleanerEvery hourOrphan cleanup60s

All agents have TimeoutSeconds set to prevent zombie accumulation—a rule I learned the hard way after finding 47 stale processes one morning.

The Ambient Context System

Architecture diagram

Claude starts each session knowing what changed—no explanation needed.

The Idea Processor (bd-automate)

Every 30 minutes:

  1. Scan all Beads workspaces for issues labeled automate
  2. Rank by priority using graph analysis (bv -robot-priority)
  3. Spawn headless Claude (claude -p) to implement
  4. Update issue with results
  5. Close on success

Lock file at ~/.beads/automate.lock prevents concurrent runs (45-minute stale lock detection).


Plugins: Extending Claude's Capabilities

I run 16 active plugins from 7 marketplaces:

Document Skills (16 sub-skills)

  • pdf: Extract, merge, split, fill forms, generate
  • xlsx: Spreadsheet operations with formulas
  • pptx: Presentations
  • docx: Documents with tracked changes

Developer Tools

  • dev-browser: Persistent Playwright server with stateful pages (14% faster, 39% cheaper than Playwright MCP)
  • hookify: Create behavioral hooks without editing JSON—just describe what should happen
  • ralph-wiggum: Self-referential loops where Claude feeds its own output back as input
  • agent-sdk-dev: Create and verify Agent SDK applications with verifiers for Python/TypeScript

OpenProse

A programming language for AI sessions by irl-dan:

PROSE
agent researcher:
  search for "topic"
  **until comprehensive understanding**

agent writer:
  draft based on researcher.findings

The **until X** syntax is a semantic loop condition—it keeps running until Claude judges the condition true.


The Hooks: Guardrails and Context

Safety Guard (PreToolUse)

A Python hook intercepts commands before execution:

Python
BLOCKED_PATTERNS = [
    r"git reset --hard",
    r"rm -rf",
    r"git clean -f",
    r"git push.*--force",
    r"git stash drop",
]

ALLOWED_PATTERNS = [
    r"git stash$",           # stash without drop is safe
    r"git checkout -b",      # creating branches is safe
    r"rm -rf /tmp/",         # tmp cleanup is fine
]

This saved me last week when I asked Claude to "clean up the branch" and it tried to run git reset --hard origin/main. The hook blocked it, and Claude rephrased to a safer approach.

Session Start Context

Every session begins with:

  • Current date/time (injected by hook)
  • Memory stats (20 rules, 28 patterns)
  • Ambient events (recent file changes)
  • Morning kickstart trigger (first session of day)

Status Line

prototype-01 (main) | opus | ████████░░ 78% | $2.34

Real-time visibility into directory, model, context usage, and session cost.


Configuration Philosophy

1. Everything is Git-Controlled

My ~/.claude/ directory is a git repo. Configuration changes are versioned and reversible.

2. Sensitive Data is Gitignored

GITIGNORE
history.jsonl    # Chat history (large, sensitive)
state/           # Machine-specific
memory/          # Kept in vault, synced separately
plugins/         # Auto-downloaded, regeneratable

3. Permissions are Whitelisted

Rather than deny-listing bad patterns, I explicitly allow good ones:

JSON
{
  "permissions": {
    "allow": [
      "Bash(export PREFERRED_PROVIDER=*)",
      "WebFetch(domain:github.com)",
      "mcp__things__get_today",
      "Skill(document-skills:pdf)"
    ],
    "deny": []
  }
}

46 explicit allows. Anything not listed requires approval.

4. Automation Never Auto-Executes Risky Actions

From my CLAUDE.md:

automation: queue for review, never auto-execute actions that create calendar events, tasks, or file changes

Claude proposes; I approve.


What Doesn't Work Well

Pattern conflicts. Sometimes two rules contradict each other. The memory system doesn't detect this automatically—I have to notice it in practice and manually resolve.

Automation failures at 3 AM. When bd-automate runs overnight and hits an error (rate limit, unclear requirements, missing dependency), it logs and moves on. I wake up to partial work and have to debug what went wrong.

MCP server startup time. Some servers (especially Python ones with heavy imports) add 2-3 seconds to session start. Noticeable when I'm just doing a quick task.

Memory system maintenance. The diary → patterns → rules pipeline requires manual triggering (/reflect, /curate). I've let it go stale for weeks before noticing Claude forgot preferences.

Cost creep. All this automation adds up. The background processors alone run ~$15/month in API calls, mostly from bd-automate implementations.


What I'd Do Differently

Start with the memory system. Basic pattern extraction fixed my biggest frustration: repeating myself every session.

Be aggressive with hooks. The safety guard blocked a git reset --hard that would have nuked an hour of uncommitted work. The upfront cost of writing guards pays for itself immediately.

Embrace MCP servers. 30 minutes setting up the Things MCP saved hours of "paste this into Things" workflows.


The Result

After a year of iteration, Claude Code isn't a CLI tool—it's a personalized AI infrastructure:

ComponentCount
Session diaries131
Extracted patterns28
Active rules20
CLI tools40+
MCP servers10
Background processors10
Plugins16

The system learns, remembers, and works autonomously. It's not perfect—sometimes patterns conflict, sometimes automation breaks—but it's compounding. Every session makes the next one better.


If you're building something similar, I'd love to hear about it. Find me at @samzoloth or sam@samzoloth.com.