LLMs jockey for higher position as graders, reviewers, orchestrators

Published March 31, 2026

OpenAI launched a Codex plugin for Anthropic's Claude Code in what'll likely become a trend as the big LLM players all try and one-up each other to move up the orchestration layer.

The race for the latest greatest LLM is interesting, but the prize is the orchestration layer. You want a model and platform to be the orchestrator of other models. The big question here is whether Anthropic or OpenAI are well positioned to be that LLM conductor. The short answer is no.

First, let's recap a few recent developments.

Now I could wait for a third data point, which will probably come in a few minutes from Anthropic, but why bother? You know what's coming. LLMs need to move up the stack and be orchestrators and graders. Why be the student when you can be the prof? Pretty soon, every LLM will become a reviewer of some other LLM.

Here's the catch.

If you're an enterprise looking to orchestrate and rate a bunch of LLMs you're likely to rely on your existing cloud hyperscaler such as Amazon Web Services with Bedrock, Google Cloud with Vertex AI and Microsoft Azure. If you're looking to orchestrate workflows and models, perhaps ServiceNow or Salesforce is the pick. Or you just use your existing SaaS providers that are rapidly incorporating LLMs underneath. After all, the best models for the enterprise are going to be accurate and cost effective and possibly smaller and trained with domain expertise.

As LLM giants quickly move to higher orchestration levels perhaps the biggest takeaway is that they're going commodity in a hurry. Sit back and enjoy the show as every LLM tries to grade and review the others while realizing the IT buyer is the ultimate reviewer.

A few reads: