Engineering

Collaborative coding with competing LLMs

Working with one model is useful, but working with two can expose blind spots faster. One model can draft a plan, and the other can attack it. One can propose a change, and the other can look for the failure mode. The goal is not novelty; it is better pressure testing.

This pattern works because LLMs are good at producing plausible answers, not guaranteed ones. If you ask the same model to plan, implement, and approve its own work, you reduce friction but also reduce scrutiny. A second model introduces a separate line of questioning and makes weak assumptions easier to spot.

The strongest use is in planning. Ask one model to break the problem into small steps, then ask another to challenge the sequence, the scope, and the missing edge cases. That review often reveals unclear requirements before any code is written, which saves time later.

The same approach helps during implementation. Let one model draft the change, then use the other to review the diff with a narrow lens: security, tests, naming, or architectural fit. Each review pass should have a specific job so the feedback stays actionable instead of generic.

This also pairs well with fast feedback loops. A small plan, a small implementation, and a small review cycle keep the work moving without letting the model drift. If the result fails the check, adjust immediately instead of stacking more generated code on top.

Competing models are especially useful when the risk is ambiguity. Security concerns, dependency choices, and long-term maintainability benefit from disagreement. If two systems point at different failure modes, the human can decide which one matters most and keep ownership of the outcome.

The important part is that the human stays in the loop. The models can argue, inspect, and refine, but they do not own the architecture. When used this way, competition between models becomes a quality tool rather than an automation stunt.