Last updated: June 5, 2026 · Build-week coverage with verified specs, GitHub Copilot pricing, and a developer’s read on Microsoft’s OpenAI-independence play.
MAI-Code-1-Flash is Microsoft’s first in-house coding model, unveiled at Build 2026 (June 2–3) and rolling out inside GitHub Copilot on Free, Pro, Pro+ and Max plans through the Visual Studio Code model picker. It is a sparse Mixture-of-Experts model with 137B total parameters, a 256K context window, and GitHub-listed pricing of $0.75 input / $4.50 output per million tokens. Microsoft pairs it with MAI-Thinking-1, a reasoning model in private preview on Foundry. The strategic point is bigger than the benchmarks: Microsoft now has a homegrown coding stack that reduces its reliance on OpenAI and undercuts frontier models on cost.
What is MAI-Code-1-Flash?
MAI-Code-1-Flash is the first coding model built by Microsoft AI (MAI), the division led by Mustafa Suleyman, and it shipped at the Build 2026 developer conference in San Francisco on June 2, 2026. It takes natural-language descriptions and produces source code, and it is wired directly into GitHub Copilot and Visual Studio Code rather than offered only as a raw API. The “Flash” suffix is the giveaway on intent: this is a latency- and cost-optimised model, not a frontier flagship.
The architecture is a sparse Mixture-of-Experts model with 137 billion total parameters, a 256,000-token context window, and a training cut-off built from data through roughly May 2026 (Microsoft lists a March–May 2026 training run). Because it is sparse MoE, only a fraction of those 137B parameters activate per token, which is how Microsoft hits its efficiency target while keeping a large total capacity. For a refresher on why MoE routing matters for inference cost, see our explainer on inference economics across providers.
| Spec | MAI-Code-1-Flash | MAI-Thinking-1 |
|---|---|---|
| Type | Coding model | Reasoning model |
| Architecture | Sparse Mixture-of-Experts | Mid-sized dense/active |
| Parameters | 137B total (sparse) | ~35B active |
| Context window | 256K tokens | 128K tokens |
| Availability | GitHub Copilot (VS Code), rolling out from June 2 | Private preview on Microsoft Foundry |
| Listed price | $0.75 in / $4.50 out per 1M tokens | Not public (preview) |
| Also via | Fireworks AI, Baseten, OpenRouter (announced) | |
Specs as listed by Microsoft AI and the GitHub Copilot changelog on June 2, 2026. Parameter and benchmark figures are vendor- or analysis-reported and are not independently audited.
How do I use MAI-Code-1-Flash in GitHub Copilot?
If you have any paid or free GitHub Copilot plan, you select MAI-Code-1-Flash from the model picker inside Visual Studio Code — the same dropdown you already use to switch between Claude, GPT and Gemini models. The rollout started on June 2 to a limited share of individual Copilot users and is expanding “over the coming weeks,” so do not panic if it has not appeared in your picker yet. Microsoft also announced that MAI models will be served through Fireworks AI, Baseten and OpenRouter, which means you can call the model outside the Copilot UI once those endpoints go live.
Here is the practical pattern for hitting it through an OpenAI-compatible gateway such as OpenRouter, which is the path most teams will use for CI scripts and agents rather than the IDE:
from openai import OpenAI
# MAI models are exposed through OpenAI-compatible gateways
# (OpenRouter / Fireworks / Baseten). Endpoint + model id vary by provider.
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_KEY",
)
resp = client.chat.completions.create(
model="microsoft/mai-code-1-flash", # check provider for exact slug
messages=[
{"role": "system", "content": "You are a senior Python reviewer."},
{"role": "user", "content": "Refactor this function and flag any bug:\n\n" + diff},
],
max_tokens=2048,
)
print(resp.choices[0].message.content)
During a staged rollout, provider-specific model IDs change. Treat microsoft/mai-code-1-flash as illustrative and confirm the exact slug in each provider’s model list before wiring it into production. The Copilot model picker is the only fully supported surface on day one.
How good is MAI-Code-1-Flash?
The honest answer for a launch this fresh: good enough to be useful and cheap, not good enough to lead the frontier — and Microsoft is not pretending otherwise. Third-party benchmark roundups place MAI-Code-1-Flash at roughly 51% on SWE-Bench Pro, which puts it in the same tier as GPT-5.3 and clearly below the current open-weight leaders and Anthropic’s flagship. What Microsoft emphasises instead is efficiency: the model is reported to use around 60% fewer tokens than comparable models on hard tasks, which is where the “Flash” positioning and the low price come together.
| Model | SWE-Bench Pro (reported) | Position |
|---|---|---|
| Kimi K2.6 | ~58.6% | Open-weight leader tier |
| GLM-5.1 | ~58.4% | Open-weight leader tier |
| Claude Opus 4.6 (baseline) | Above Flash tier | Frontier flagship |
| MAI-Code-1-Flash | ~51% | Cost/efficiency tier (≈ GPT-5.3) |
SWE-Bench Pro scores in this table are vendor- and analysis-reported, gathered from launch-week coverage rather than a single audited leaderboard run. Use them to place MAI-Code-1-Flash in a tier, not to rank models to the decimal. The defensible conclusion is that Flash trades a few points of raw accuracy for a large cut in tokens and dollars — which is exactly the trade a Copilot-default model should make.
That price gap is the real story. At $0.75 input / $4.50 output per million tokens, MAI-Code-1-Flash is dramatically cheaper than frontier models like Claude Opus 4.8 ($5 / $25) or premium GPT tiers. For the high-volume, low-stakes coding that fills most Copilot sessions — autocomplete-grade refactors, boilerplate, test scaffolds — you do not need a frontier model, and Microsoft now has a first-party one tuned for that lane.
What is MAI-Thinking-1?
MAI-Thinking-1 is Microsoft’s first reasoning model, announced alongside Flash and currently in private preview on Microsoft Foundry. It is a mid-sized model with about 35 billion active parameters and a 128K context window, built for multi-step instructions, long-context reasoning and code generation at low token cost. Microsoft’s headline claim is that MAI-Thinking-1 matches Claude Opus 4.6 on coding in SWE-Bench Pro — but that figure comes from Microsoft’s own testing, so treat it as a vendor-reported benchmark until independent evaluations land.
The split is deliberate. Flash is the cheap, fast default that ships in the Copilot picker for everyone; Thinking-1 is the heavier reasoning option gated behind Foundry preview for teams that need multi-step problem solving. It mirrors the fast-vs-deep split every major lab now runs — the same pattern you see across ChatGPT, Claude and Gemini tiers.
Why is Microsoft building its own models?
Because owning the model changes the economics and the leverage. Microsoft has spent years routing Copilot through OpenAI, paying frontier-tier inference costs and depending on a partner whose interests no longer perfectly align with its own. A first-party MAI stack does three things at once: it lowers the cost of every Copilot completion, it removes a single point of dependence, and it lets Microsoft tune models specifically for its developer surfaces — VS Code, GitHub, Azure — instead of consuming a general-purpose API.
This is the part the benchmark debate misses. MAI-Code-1-Flash does not need to beat Claude or GPT to be strategically valuable. It needs to be good enough at a fraction of the cost for the median Copilot task, and to exist under Microsoft’s own roof. The reasoning model in preview, the seven-model MAI family unveiled at Build, and the distribution deals with Fireworks, Baseten and OpenRouter all point the same way: Microsoft wants to be a model maker, not just a model reseller.
What does this mean for developers?
For day-to-day work, the practical takeaway is simple: you are about to get a cheaper default in the Copilot picker, and for a lot of tasks you will not notice the accuracy gap. My read after working through the launch materials — I run DTF’s entire publishing workflow inside coding-assistant tooling — is that MAI-Code-1-Flash is a routing decision, not a frontier decision. Send the cheap, high-volume work to Flash; keep a frontier model on the model picker for the gnarly multi-file refactors and the bugs that actually cost money if missed.
- Use Flash as the default, escalate on failure. For autocomplete, boilerplate, tests and small refactors, the token savings compound fast. When a task needs deep reasoning, switch to a frontier model in the same picker.
- Watch the independent benchmarks, not the launch slides. The “matches Opus 4.6” claim for MAI-Thinking-1 is Microsoft’s own number. Wait for third-party SWE-Bench and Terminal-Bench runs before trusting it for anything load-bearing.
- The cost curve is the headline. A first-party model lets Microsoft keep cutting Copilot’s price. Expect more aggressive Copilot bundling once Flash carries a meaningful share of traffic.
- OpenAI’s moat just got narrower inside Microsoft’s own product. That is the structural story to track over the rest of 2026 — and it changes how you should think about which assistant to standardise on. Our best AI coding assistants guide covers that decision in depth.
For the broader frontier context this lands in — GPT-5.5, Gemini 3.5 and the open-weight surge from Kimi and GLM — see our GPT-5.5 breakdown and the running model comparison. The one-line summary: Microsoft just stopped being only a customer of frontier labs and started being a competitor to them, starting in the place it controls most — the IDE.
FAQ
What is MAI-Code-1-Flash?
MAI-Code-1-Flash is Microsoft’s in-house coding model for GitHub Copilot and Visual Studio Code, announced on June 2, 2026. The official model card lists a sparse Mixture-of-Experts transformer with 137B total parameters, 5B active parameters and a 256K-token context window. Its positioning is low-latency, low-cost developer assistance, not frontier-model replacement.
How much does MAI-Code-1-Flash cost?
GitHub’s Copilot models-and-pricing page lists MAI-Code-1-Flash at $0.75 per million input tokens, $0.075 per million cached input tokens and $4.50 per million output tokens. That pricing puts it in GitHub’s lightweight category and explains why Microsoft can use it as a high-volume Copilot model.
How do I get MAI-Code-1-Flash in GitHub Copilot?
Select it from the model picker in Visual Studio Code, the same dropdown used to choose Claude, GPT or Gemini models. GitHub says rollout starts with Copilot Free, Pro, Pro+ and Max users and expands gradually, so the model may not appear for every account immediately.
How good is MAI-Code-1-Flash on benchmarks?
Microsoft’s model card reports 51.2% on SWE-Bench Pro versus 35.2% for Claude Haiku 4.5 in Microsoft’s VS Code-based production harness, plus up to 60% fewer tokens on SWE-Bench Verified. Treat that as a vendor-run harness result, not an independent leaderboard ranking against every frontier and open-weight model.
What is MAI-Thinking-1?
MAI-Thinking-1 is Microsoft’s reasoning model announced alongside MAI-Code-1-Flash. Microsoft describes it as a medium-sized MoE model with 35B active parameters and roughly 1T total parameters, available through Microsoft Foundry private preview. Claims that it is competitive with Claude Opus 4.6 on coding should be read as Microsoft-reported until independent evaluations catch up.
Why is Microsoft building its own AI coding models?
To lower the cost of every GitHub Copilot completion, reduce dependence on OpenAI, and tune models specifically for Microsoft-controlled developer surfaces such as VS Code, GitHub Copilot and Azure. The model does not need to beat every frontier system; it needs to be good enough for the median Copilot task at much lower serving cost.
Can I use MAI models outside GitHub Copilot?
Microsoft says the broader MAI model family is coming to OpenRouter, Fireworks AI and Baseten, but the MAI-Code-1-Flash model card lists GitHub Copilot in Visual Studio Code as the launch channel and says future release formats would come with updated documentation. Confirm the exact provider model ID before wiring it into scripts, CI or agents.
Bibliography (12 sources)
Sources prioritise Microsoft and GitHub primary documentation. Model-card architecture and benchmark figures are official Microsoft disclosures, but benchmark comparisons still remain vendor-run until independent harnesses reproduce them. Links accessed June 5, 2026.
- Microsoft AI — Introducing MAI-Code-1-Flash (primary announcement: architecture, positioning, Build 2026 launch)
- Microsoft AI — MAI-Code-1-Flash model card (137B total / 5B active MoE, 256K context, training window, launch channel, benchmark table)
- Microsoft AI — Building a hill-climbing machine: launching seven new MAI models (MAI family context, 5B active parameter note, OpenRouter / Fireworks / Baseten distribution)
- Microsoft AI — MAI-Thinking-1 model page (35B active, ~1T total MoE, Foundry private preview, vendor benchmark positioning)
- GitHub Changelog — MAI-Code-1-Flash is now available for GitHub Copilot (June 2, 2026 rollout, plan availability)
- GitHub Docs — Models and pricing for GitHub Copilot (source for $0.75 input / $0.075 cached input / $4.50 output per 1M token figures)
- Microsoft News — Build 2026 announcements hub (MAI model family, developer tooling)
- CNBC — Microsoft unveils new AI models to lessen reliance on OpenAI and lower costs for developers (June 2, 2026)
- CNBC — Microsoft and Google take on Anthropic and OpenAI in AI coding models (strategic context)
- Implicator.ai — Microsoft starts MAI-Code-1-Flash Copilot rollout (model-card interpretation and launch-channel analysis)
- SWE-Bench Pro — public leaderboard and methodology (benchmark referenced for coding-model comparisons)
- OpenRouter — model catalog (verify provider-side model IDs before using MAI endpoints outside Copilot)
