LLMs jockey for higher position as graders, reviewers, orchestrators
OpenAI launched a Codex plugin for Anthropic's Claude Code in what'll likely become a trend as the big LLM players all try and one-up each other to move up the orchestration layer.
The race for the latest greatest LLM is interesting, but the prize is the orchestration layer. You want a model and platform to be the orchestrator of other models. The big question here is whether Anthropic or OpenAI are well positioned to be that LLM conductor. The short answer is no.
First, let's recap a few recent developments.
- OpenAI dropped its Codex plugin for Claude Code on GitHub. The gist is that Codex can review what Claude Code has cooked up. Depending on the post on X, OpenAI's Codex plugin is either cheeky or desperate. The irony is that perhaps Anthropic and OpenAI do work better together much to the chagrin of their respective camps.
- Microsoft 365 Copilot's Researcher agent is now multi-model and uses Anthropic to grade OpenAI with a feature called Critique. There's also a Counsel feature where you can have OpenAI and Anthropic do the research and Microsoft Researcher will show you the differences. The goal here is accuracy. The strategic implication is that Microsoft wants the context layer and LLMs are a commodity.
Now I could wait for a third data point, which will probably come in a few minutes from Anthropic, but why bother? You know what's coming. LLMs need to move up the stack and be orchestrators and graders. Why be the student when you can be the prof? Pretty soon, every LLM will become a reviewer of some other LLM.
Here's the catch.
If you're an enterprise looking to orchestrate and rate a bunch of LLMs you're likely to rely on your existing cloud hyperscaler such as Amazon Web Services with Bedrock, Google Cloud with Vertex AI and Microsoft Azure. If you're looking to orchestrate workflows and models, perhaps ServiceNow or Salesforce is the pick. Or you just use your existing SaaS providers that are rapidly incorporating LLMs underneath. After all, the best models for the enterprise are going to be accurate and cost effective and possibly smaller and trained with domain expertise.
As LLM giants quickly move to higher orchestration levels perhaps the biggest takeaway is that they're going commodity in a hurry. Sit back and enjoy the show as every LLM tries to grade and review the others while realizing the IT buyer is the ultimate reviewer.
A few reads:
- AI Forum 2026: There isn’t an easy button for AI
- LittleHorse’s McNealy on Business-as-Code, AI agents, orchestration
- Constellation Research's Futures Forum: What CEOs are thinking
- Nvidia GTC 2026: Nvidia launches NemoClaw, eyes to pair with DGX Spark, DGX Station
- Where’s tokenomics for the rest of us?
- Nvidia Nemotron: Much needed open-source model champion in US
- Why enterprise AI leaders need to bank on open-source LLMs
- Anthropic vs. SaaS: A nuanced view