Architecture

How to build agentic systems

Where this is coming from

For the last few months, I have been working with agentic systems, both at work and on personal projects. Matato is one of those projects, and it’s where a lot of the ideas below were stress-tested. This post is a dump of what stuck.

The single most useful thing I want to share is the mental model I ended up with for the architecture: there are basically two shapes, or flavours, of agentic systems.

Two shapes: fat agents and microagents

On one side you have what I call fat agents (or big agents): a single, strong agent harness with access to tools, skills, MCPs, and the ability to plan, split work, and execute on its own. On the other side you have microagents: small, single-purpose LLM calls wired together in a deterministic chain.

They are not competing designs. They solve different problems. And I’ve come to the conclusion that it must be a design decision you make up front.

Fat agents

A fat agent is the kind of thing you already know from Claude Code, Codex, Pi, or DeepAgents. It is one big “brain” with access to tools and skills, capable of taking an ambiguous task, talking to the user, producing a plan, and acting on it. Some of these even nest other harnesses: Pi can drive Codex, for example.

What they are generally good at is working through complex systems or complex workflows. Think of an agent that needs to use tools, skills, and MCPs together to complete a piece of work. It can go through a series of steps — planning, orchestration, and a lot of back-and-forth to complete the task. On top of these, you can build custom solutions, mostly in plain English: instead of writing programmatic solutions, you write instructions. Examples of this are OpenClaw and Hermes, and you can take a look at Matato as well.

I do have to say, though, that the main downside is that they are really expensive. These kinds of agents or assistants consume a lot of tokens because they rely heavily on reasoning, and all those exchanges produce a lot of content the agent has to consume in turn. Another issue is latency — getting an answer can be pretty slow. One of the challenges I ran into with Matato is that I wanted quick feedback, fast turnarounds with the agent. Having all these tools and possible scenarios available made the agent slow, since it had to reason through every single output.

Microagents

Microagents are more like plain LLM calls: no tool calls, no skills, no memory, nothing beyond the main prompt. The trick is that when the workflow itself is concrete and deterministic, you can drop an LLM inside it without losing that determinism. LLMs are not deterministic machines, but the architecture around them can make them behave like one.

This is the approach I used for QuickNote. It’s a chain of small agents, each with a short, focused prompt, each talking to a small OpenAI model with one simple tool — nothing more. The agents run in sequence:

  1. A grammar agent
  2. A metadata agent
  3. A title agent
  4. A Summary agent
  5. A Summary agent
  6. A Room agent
  7. A Topic agent

The output of each one feeds the next. Because every step is small and constrained, the end result is stable: no matter how many times I run it, the results tend to be quite similar, which wasn’t the case with the fat agent attempt.

They are good because you keep the context and the control. The input is deterministic, and the output is deterministic too. The downside is that these things are not smart. They don’t really think — they’re good for classification, for deterministic transformations, for summarisation, and things like that, but nothing fancy like coming up with new ideas or novel solutions.

When to reach for which

The rough heuristic I’ve landed on:

  • Use a fat agent when the task is ambiguous, exploratory, or needs planning and judgement across multiple steps.
  • Use microagents when the workflow is well understood and you want the same result every time.