maintainable.software

Curated reading list

References

A curated reading list of articles, papers, and essays about context engineering, tools, memory, retrieval, and reliable AI agent workflows.

This page collects resources I find valuable in one way or another, whether because they are especially useful, well-written, thought-provoking, or worth revisiting over time. The emphasis is on material that helps when building systems that need to plan, use tools, retrieve information, and stay reliable over long runs.

Anthropic

Building Effective AI Agents

A good foundation for deciding when to use a workflow, when to use an agent, and when the simplest single-call setup is enough.

Introducing Contextual Retrieval

Concrete retrieval technique for giving chunks just enough surrounding meaning before indexing them, which helps agents pull the right evidence later.

arXiv

ArchUnit

User Guide

Shows how to turn architecture rules into executable tests so boundaries stay enforced as code changes.

Bazel

Hermeticity

A strong explanation of why isolated, reproducible builds make automated changes easier to trust and debug.

Carlos E. Jimenez et al.

Chris Richardson

Service per team

Good reminder that service boundaries work best when they line up with ownership, which also keeps agent tasks narrower.

Cognition

Don't Build Multi-Agents

Clear argument for sharing full traces and carrying decisions forward, instead of fanning work out before the problem really needs it.

Craig Larman

Cucumber

BDD

Useful for turning examples into shared, testable expectations that humans and automation can agree on.

Gherkin Reference

Worth keeping nearby when you want scenarios to stay precise enough to drive tests, documentation, or agent checks.

D. L. Parnas

G. Kiczales et al.

Aspect-Oriented Programming

A good historical reference for cross-cutting concerns and the tradeoff between local clarity and shared behavior.

GitHub Blog

Google Research

Google SRE

Service Level Objectives

Core reading for turning vague reliability goals into measurements that an automated system can actually optimize against.

Google Testing Blog

H. Gall

Herbert Graca

Packaging & namespacing

A helpful way to think about packages and namespaces as real architecture boundaries instead of just file organization.

Import Linter

Layers

A practical example of enforcing dependency direction in Python, which keeps generated or agent-edited code from breaking architecture.

J. Becker

Jimmy Bogard

Vertical Slice Architecture

Useful for organizing code around use cases so changes stay localized and agents can work on one slice without touching a whole layer stack.

John Ousterhout

A Philosophy of Software Design

One of the best general references for reducing complexity by designing modules that stay small, coherent, and easy to reason about.

Katie Hempenius

Performance Budgets 101

A concrete way to keep an agent honest about cost, because performance limits need explicit budgets instead of vague aspirations.

Kent C. Dodds

AHA Programming

A short case for waiting on abstraction until the variation actually shows up, which helps avoid over-generalized code from both humans and agents.

LangChain Blog

Context Engineering

A useful taxonomy for writing, selecting, compressing, and isolating context as separate problems instead of one vague prompt challenge.

M. Cataldo

Martin Fowler

Bounded Context

Strong way to keep domain boundaries clear so an agent or teammate does not have to solve the whole system at once.

Branch By Abstraction

A practical migration pattern when you need to replace behavior gradually without freezing the rest of the system.

Conway's Law

A reminder that team structure leaks into architecture, which matters when agent workflows mirror org boundaries.

Flag Argument

Useful warning about APIs that hide multiple behaviors behind one parameter and become hard for agents to use correctly.

Test Pyramid

Still one of the cleanest heuristics for placing verification where it is cheapest and most informative.

Martin Fowler and Birgitta Böckeler

Martin Fowler and James Lewis

Microservices

The classic case for splitting services along independently deployable boundaries, which is still the right default when agent work needs a smaller surface area.

Microsoft Learn

Use Test Impact Analysis

Useful for running only the tests likely to be affected by a change, which is exactly the sort of bounded feedback loop agents need.

N. Ajienka

Neal Ford

OpenAI

OpenAI Developers

Building an AI-Native Engineering Team

Useful organizational guidance for separating planning, implementation, and testing work so agents fit into the team instead of becoming a side experiment.

Run long-horizon tasks with Codex

A useful long-run case study on keeping a single session productive for hours through checkpoints, validation, and good status artifacts.

S. Amann

Sandi Metz

The Wrong Abstraction

Classic reminder that premature abstraction is often worse than direct code, especially when the variation you are abstracting for has not appeared yet.

Software Engineering at Google

Chapter 10: Documentation

Useful for thinking about docs as an engineering artifact that agents should be able to rely on and keep in sync.

Chapter 11: Testing Overview

A broad map of testing as an engineering system, helpful when you need reliable feedback loops for agent-run changes.

Chapter 12: Unit Testing

Good reference for the fastest, most localized form of feedback an agent can get while iterating.

Chapter 17: Code Search

Useful for making large codebases navigable, because searchability is often what lets an agent find the right context at all.

Thoughtworks

W. P. Stevens

Structured Design

Foundational module-design reading on keeping responsibilities separate and interfaces clean.