Klaus Code: Writing Your Own AI Agent From Scratch in Go.

Klaus Code: Writing Your Own AI Agent From Scratch in Go

Table Of Contents

TL;DR

  • There are two things that still give an experienced engineer a genuine “wow” moment in the age of AI. One is running an LLM locally on your own machine. The other - the subject of this post - is writing your own agent: a program that reasons, decides which tool to use, runs it, looks at the result, and keeps going until the task is done.
  • Once you build one, the magic of products like Claude Code or Cursor evaporates in the best possible way. You understand exactly what is happening under the hood. It is a loop.
  • I built a tiny proof-of-concept called Klaus Code (yes, a nod to a certain other coding agent): a ReAct agent in Go, with zero third-party dependencies - only the standard library.
  • The core idea is almost embarrassingly simple: the model writes Thought: and Action: tool(args) as plain text, your program stops generation before the model can hallucinate the result, runs the real tool, feeds back Observation: ..., and loops until the model writes Final Answer:.
  • The whole agent loop fits on one screen. The “intelligence” is in the model; the harness is just plumbing - but writing that plumbing yourself is what makes it click.

Two kinds of “wow”

I have been writing software for a long time, and very little surprises me anymore. Most “revolutionary” frameworks turn out to be the same ideas in new packaging. But in the last couple of years there have been exactly two moments where I leaned back from the keyboard and thought huh, that is genuinely magic.

The first was running a real LLM locally - on my own laptop, no cloud, no API key, the model just answering questions from a file on disk. I wrote about adjacent things before. That is its own kind of wow, and it is not what this post is about.

The second was the first time I wrote my own agent and watched it solve a problem I had not hand-coded a path for. It read the task, reasoned about it in plain English, decided it needed a tool, called the tool, looked at the answer, and concluded. I did not write the if-statements that made that happen. The model did the deciding. I only built the loop around it.

If you have used Claude Code, Cursor, or any of the “agentic” tools and wondered what is actually happening behind the curtain - the answer is delightfully unglamorous. It is a loop. Once you build one yourself, those products stop being magic and start being understandable, which is a far more useful state to be in as an engineer (and as a CTO trying to reason about what these tools can and cannot do).

So I built one. I called it Klaus Code - a small homage to Claude Code, the tool that nudged me down this rabbit hole in the first place.

What an agent actually is

Strip away the marketing and an agent is three things glued together:

  1. A language model that is good at reasoning in text.
  2. A set of tools - functions the model is allowed to call (a calculator, a web search, a shell, a database query).
  3. A loop that lets the model alternate between thinking and acting until it has an answer.

The pattern that ties these together has a name and a paper: ReAct - Reasoning + Acting (Yao et al., 2022). The insight is that if you let a model interleave reasoning steps with tool calls, it solves multi-step problems far more reliably than if you ask it to answer in one shot. It can think, check its work against reality, and adjust.

The format is a conversation that looks like this:

Thought: I need to compute (12 * 9) + 3.
Action: calculate((12 * 9) + 3)
Observation: 111
Thought: I have found the answer.
Final Answer: 111

The crucial trick - and the thing that surprised me most - is the boundary between who writes what. The model writes the Thought: and Action: lines. The harness (your code) writes the Observation: lines, by actually running the tool. The model never computes 111 itself; it asks for it, and your program supplies the real answer.

How Klaus Code does it

Klaus Code is deliberately tiny and boring. It is written in Go, has no third-party dependencies, and is layered with plain interfaces for dependency injection - the OpenAI client and the tools both sit behind interfaces, so every layer is unit-testable without touching the network.

PackageResponsibility
internal/llmThe provider boundary: a Client interface and an OpenAI implementation over net/http.
internal/toolsThe action boundary: a Tool interface, a Registry, and the calculate tool.
internal/agentThe ReAct loop, the system prompt, and the turn parser.
cmd/klauscodeThe composition root - reads config, wires everything together, runs the task.

Running it looks like this:

export OPENAI_API_KEY=sk-...
go run ./cmd/klauscode "What is (12 * 9) + 3?"

The reasoning trace goes to stderr; the final answer goes to stdout, so you can grab just the answer:

go run ./cmd/klauscode "What is 7 * 6?" 2>/dev/null
# 42

The loop

Here is the heart of the whole thing - the agent loop. This is the part that, once you have written it, makes every agentic product suddenly legible:

for i := 0; i < a.maxSteps; i++ {
    output, err := a.client.Complete(ctx, messages, []string{observationStop})
    if err != nil {
        return "", fmt.Errorf("model call failed: %w", err)
    }
    messages = append(messages, llm.Message{Role: "assistant", Content: output})

    step := ParseStep(output)

    if step.HasFinal {
        return step.FinalAnswer, nil
    }

    if !step.HasAction {
        messages = append(messages, llm.Message{
            Role:    "user",
            Content: "Observation: No valid Action found. Respond with either an Action line or a Final Answer.",
        })
        continue
    }

    observation := a.runTool(step)
    messages = append(messages, llm.Message{
        Role:    "user",
        Content: "Observation: " + observation,
    })
}

That is the entire agent. Call the model, parse what it said, and either return the final answer or run a tool and feed the result back. Everything else is detail.

The one trick that makes it work

Look closely at this line:

output, err := a.client.Complete(ctx, messages, []string{observationStop})

That second argument is a stop sequence:

const observationStop = "Observation:"

We tell the model: generate text, but the moment you are about to write Observation:, stop. Without this, the model would happily continue past the action and invent the observation - it would write Action: calculate((12 * 9) + 3) and then cheerfully make up Observation: 111 (or worse, Observation: 105) without anything ever being computed.

The stop sequence hands control back to the harness at exactly the right moment. Our code runs the real tool, computes the real result, and supplies the observation. This is the seam where the model’s reasoning meets reality - and it is one line of code.

Teaching the model the rules

The model only behaves this way because we tell it to, in the system prompt. Klaus Code builds the prompt at runtime from whatever tools are registered:

func BuildSystemPrompt(reg *tools.Registry) string {
    var b strings.Builder
    b.WriteString(promptHeader)
    for _, t := range reg.List() {
        b.WriteString("- ")
        b.WriteString(t.Description())
        b.WriteString("\n")
    }
    b.WriteString(promptFooter)
    return b.String()
}

The footer is the contract - the strict format the loop depends on:

CRITICAL FORMAT RULES:
You must strictly follow this exact format for every turn. Do not skip steps.

Thought: [Reason about what you need to do next]
Action: [tool_name]([arguments])
Observation: [Do not write this yourself. The system will provide this.]

When you have the final answer to the user's request, use this format:
Thought: I have found the answer.
Final Answer: [Your definitive response to the user]

Because the tool list is generated from the registry, adding a tool automatically updates what the model is told it can do. No prompt editing required.

Tools are just an interface

A tool in Klaus Code is anything that satisfies one small interface:

type Tool interface {
    Name() string        // identifier used in the Action line
    Description() string // one line rendered into the system prompt
    Call(args string) (string, error) // args = raw text inside the parentheses
}

The proof-of-concept ships exactly one: calculate, backed by a hand-written recursive-descent arithmetic evaluator (also zero dependencies - it parses (12 * 9) + 3 itself, with proper precedence and parentheses). But the shape is the point. A web search, a SQL query, a file read, a shell command - they are all just Call(args string) (string, error). The agent loop does not care what a tool does; it only knows how to ask one to run.

The details that turn a demo into something usable

The naive loop works on a sunny day. Most of the interesting engineering is in handling the days that are not sunny - and this is where you learn how these systems really behave.

Tool errors become observations, not crashes. When a tool fails, Klaus Code does not abort the run. It hands the error back to the model as an observation:

func (a *Agent) runTool(step Step) string {
    result, err := a.tools.Execute(step.ToolName, step.ToolArgs)
    if err != nil {
        return "Error: " + err.Error()
    }
    return result
}

So if the model asks for calculate(1 / 0), it gets Observation: Error: division by zero and can reason its way to a sensible final answer instead of the whole program falling over. The model self-corrects. Watching that happen for the first time is its own small wow.

Malformed turns get a nudge, not a death. If the model forgets the format and produces neither an action nor a final answer, the harness gently reminds it (Observation: No valid Action found...) and lets it try again rather than giving up.

The model loves to add quotes. Models routinely write calculate("12 * 9") because the tool signature reads like it takes a string. Those wrapping quotes are not part of the expression, so the parser strips a genuine enclosing pair before dispatch. Small thing, but exactly the kind of friction you only discover by actually running the loop against a real model.

There is a step limit. The loop caps out after a fixed number of steps. If a model gets confused and loops, you want a backstop, not a runaway bill.

None of this is clever. All of it is necessary. And all of it is the kind of thing the big agentic products handle for you - which is precisely why building a small one yourself is so clarifying.

Why bother building this?

You will not replace Claude Code with 300 lines of Go. That is not the point.

The point is that understanding beats magic. As an engineer - and especially as a CTO trying to make sober decisions about where AI fits in a business - it matters enormously whether “agent” is a mysterious incantation or a thing you can reason about. Once you have written the loop, you understand exactly:

  • why agents sometimes hallucinate tool results (you forgot the stop sequence, or the model wrote past it),
  • why prompt format discipline matters so much (the parser is rigid),
  • why tool design is the real lever (the model is only as capable as the tools you give it),
  • and why “agentic” is mostly careful plumbing around a very capable text predictor.

That demystification is worth more than any framework. It is the difference between using a tool and understanding a tool - and in keeping with my general bias toward simplicity, the smallest possible version that actually works is usually the best teacher.

The code is small enough to read in one sitting - it is all on GitHub. If you have ever wondered what is really inside an AI agent, I would genuinely recommend writing your own. Start with the loop. Add the stop sequence. Give it one tool. Then watch it think.

That is the second wow. It is worth chasing.

Related posts