Microsoft Makes Claude Review GPT's Work in New Copilot Feature

This is one of those announcements that makes you do a double take: Microsoft just unveiled a new feature for its Copilot Researcher that uses Anthropic’s Claude as a quality checker for OpenAI’s GPT. Yes, the two biggest AI rivals are now working together inside a single Microsoft product.

How Critique Works

The concept is elegantly simple. When you submit a research query to Copilot, GPT handles the first pass — searching web sources and enterprise data from Salesforce, ServiceNow, or Confluence to produce a draft response. But before that draft reaches you, Claude steps in. Anthropic’s model reviews the text for factual accuracy, completeness of analysis, and proper source attribution.

Only after Claude gives it the green light does the response land in front of you.

13.8 Percent Better

Microsoft claims this multi-model approach delivers a 13.8 percent improvement on the DRACO benchmark — an industry measure for deep research quality. According to the company, that puts it ahead of standalone tools from OpenAI, Google, Perplexity, and even Anthropic itself.

It’s a smart move. Instead of betting everything on a single model, Microsoft leverages the respective strengths of both systems. GPT is strong at synthesizing and processing large volumes of data, while Claude has built a solid reputation for catching errors and evaluating source quality.

Copilot Cowork Enters Early Access

Alongside the Critique feature, Microsoft also opened up Copilot Cowork to its Frontier early access program. The name’s resemblance to Claude’s Cowork mode isn’t coincidental — the feature is actually built on Anthropic’s technology. Copilot Cowork lets you delegate long-running, multi-step tasks within Microsoft 365 to AI.

What This Means

The era of AI companies strictly walling themselves off from each other seems to be ending. Microsoft is showing us that the future of enterprise AI is likely multi-model. Not one model to rule them all, but specialized models that check each other’s work.

For Claude users, this is a noteworthy development: Anthropic’s model is increasingly seen as a quality benchmark — even by companies whose primary bet is on OpenAI. Microsoft plans to eventually make the workflow bidirectional too, with GPT reviewing Claude’s drafts.

Sources: