Deterministic harnesses: when to use code instead of LLMs

When building agentic systems, a useful dividing line is to decide which parts of the pipeline must be deterministic and which can safely rely on statistical language models. Deterministic code excels at repeatability, auditability, and clear failure modes: classification rules, schema validations, and exact transformations are better expressed as code. Use LLMs where flexibility and fuzzy understanding are required, but avoid relying on them for decisions that must be reproducible.

Start by isolating the responsibilities that require determinism: ownership checks, encryption and decryption steps, canonical classification, and any logic that drives side effects. Implement these as small, well-tested functions. Deterministic components give you clear inputs and outputs, which makes debugging and provenance straightforward and keeps the system auditable.

Use LLMs as a complementary layer that produces candidate information, summaries, or soft labels. For example, an LLM can extract entities, propose tags, or generate a natural-language summary. Treat these outputs as suggestions: run deterministic post-processing to validate, normalize, and, when needed, fall back to code-based decisions. This hybrid approach balances flexibility with safety.

Design the interface between the LLM and the deterministic harness carefully. Standardize the data shape the harness expects, and keep the LLM prompts focused on producing that shape rather than free-form text. When possible, prefer structured outputs (JSON, key-value pairs) from the model and validate them immediately with deterministic checks.

Operationally, this separation reduces drift. If the LLM’s behavior changes due to model updates or prompt tweaks, the deterministic layer prevents silent corruption of your critical state. It also confines the surface area for costly manual audits: you only need to inspect the deterministic outputs and the small boundary code rather than the entire prose produced by the model.

Finally, treat the harness as the source of truth for persisted decisions. Store the canonical, validated results coming out of the deterministic code, not the raw LLM output. Keep the raw model results as auxiliary data for debugging and retraining but do not make them the basis of permissions, billing, or irreversible side effects.

Combining deterministic code with LLMs yields systems that are both adaptable and reliable: code for the guarantees, models for the ideas. This pragmatic split lets you iterate quickly while maintaining control over the parts that matter most.