Grok 4.5 Leak: 1.5T Cursor Model Deep Dive

Maya Chen

Maya Chen

Lead AI Researcher

Published: June 28, 2026
Diagram representing the xAI Grok 4.5 private beta leak trail and the 1.5T Cursor-trained foundation model

TLDRPrivate beta at SpaceX and Tesla, a 1.5T V9 foundation, Cursor data — what the Grok 4.5 leak trail actually confirms and what it doesn't.

Grok 4.5 and the 1.5T Cursor Model: What the xAI Leak Trail Actually Shows

On the morning of June 28, 2026, a version number quietly disappeared from xAI's product menus. A few hours later, a quoted line attributed to Elon Musk surfaced on X: Grok 4.5 is in private beta at SpaceX and Tesla, "performing close to or beyond Opus." There is no system card. There is no benchmark sheet. There is no API note. There is a screenshot, a quote, and a 1.5T parameter number that no one at xAI has confirmed in writing.

TLDR The Grok 4.5 leak trail consists of one evidence-tagged screenshot of a missing version number, a quoted Musk statement circulated through two community accounts, and a single architectural claim about a 1.5T V9 foundation model with Cursor data added in supplemental training. The "close to or beyond Opus" performance line is community-sourced quoting, not a measured benchmark. The earlier Grok 5 thread — a 6T and 10T parameter pair in the Fable 5 weight class — is a separate, unverified roadmap claim. Builders should treat all of it as directional signal, not spec.

Key Takeaways

What Was Actually Seen

The traceable signal trail is narrow. Three tweets carry most of the architectural and product weight.

The first is the screenshot. On June 28 at 06:33 UTC, Mark Kretschmann posted that "the release of the Grok 1.5T / Cursor Composer 3 model is imminent, as the version number has been removed from the menus. This always happens shortly before a release from @xai." The post is the only entry in the signal set tagged as evidence and the only one with an attached media asset.

Seem like the release of the Grok 1.5T / Cursor Composer 3 model is imminent, as the version number

Source: @mark_k

About four and a half hours later, two accounts surfaced a quoted line from Musk within minutes of each other. AshutoshShrivastava posted that "Grok 4.5 is now in private beta at SpaceX and Tesla, performing close to or beyond Opus." TestingCatalog amplified the same framing and added the architectural detail: "Grok 4.5 is based on 1.5T V9 foundation model, with Cursor data added in supplemental training." Both posts are flagged as quote-tweets in the signal, which means each is attaching commentary to a primary post that is not itself in the bundle.

That is the entirety of the verified surface. Everything beyond it — performance claims, release timing, model variant structure, training data composition — sits on top of those three signals.

A parallel thread, posted by Mark Kretschmann a week earlier, frames Grok 5 separately. He describes it as arriving in two variants at 6T and 10T parameters, in the same weight class as Fable 5, with Cursor data in the training mix and a stated emphasis on agentic coding. None of those numbers are sourced to an xAI document.

Why The Cursor Signal Matters

The most consistent thread across June is not a benchmark or a release date. It is the Cursor data story.

Mark Kretschmann first surfaced the "Grok / Cursor 1.5T model" framing on June 16. He returned to it on June 21 with a more architectural framing, then again on June 28 with the screenshot. TestingCatalog independently used the same architectural language on June 28, citing 1.5T V9 plus Cursor supplemental data. Three posts across nearly two weeks, two accounts, one consistent claim: Cursor's data — presumably IDE traces, agentic coding traces, or both — is being folded into xAI's training pipeline.

The reason this matters more than the parameter count is incentive alignment. Cursor is the busiest agentic coding surface in production today, and the data it generates is precisely the data that frontier coding models are otherwise expensive to collect. A vendor that trains directly on that distribution has, in principle, a structural advantage on long-horizon coding tasks that no synthetic eval set captures well. That is the thesis the community is rallying around. It is not yet a measurement.

On the limited evidence so far, the Cursor Training Loop — if it exists in the form described — would be a meaningful structural shift, but the data license, the volume, and the exact mixing strategy are all unverified.

What We Can Reasonably Expect

A few things are plausible given the signal trail, with appropriate hedging.

First, the menu removal pattern. Kretschmann claims this "always happens shortly before a release from @xai." It is plausible the Grok 1.5T / Cursor Composer 3 SKU enters wider preview within days, not weeks. The pattern is community-asserted and not corroborated by an xAI changelog, so the strength of the prediction depends on whether prior xAI releases followed the same menu-clearing tell.

Second, the Opus framing. The "close to or beyond Opus" quote is the kind of comparative line that vendors use ahead of public benchmarks. It is directional, not measured. Builders should expect xAI to publish at least a partial eval table when the model goes public — SWE-bench Verified and a coding-agent suite are the most likely candidates given the Cursor training emphasis. The exact baseline Opus version is not specified, which matters: Claude Opus 4.7 and Claude Opus 4.8 are both circulating, and the gap between them is non-trivial.

Third, the variant structure. Kretschmann's earlier Grok 5 post describes two variants at 6T and 10T. The Grok 4.5 line at 1.5T appears to be a tier below that, possibly a faster / cheaper SKU positioned closer to current Sonnet-class models. If the 1.5T V9 / Cursor Composer 3 framing holds, expect a coding-focused mid-tier SKU first, with the larger Grok 5 variants following on a longer timeline.

Fourth, distribution. The private beta is described as running inside SpaceX and Tesla. That is a narrow blast radius and consistent with xAI's prior pattern of internal dogfooding ahead of external preview. Whether external partners — Cursor itself being the obvious candidate — get access before a public API is open.

Grok 4.5 vs Claude Opus: What the Signal Says

The Opus comparison is the most-cited claim in the signal trail, so it deserves direct scrutiny. The bundle does not contain measured comparison data, only community-relayed quoting. The honest read is dimension-by-dimension.

  • Stated performance band. Grok 4.5 is described as "close to or beyond Opus" via a quote attributed to Musk and circulated by AshutoshShrivastava and TestingCatalog. Claude Opus has shipped public model cards and benchmark suites for each major version. The asymmetry of evidence is the headline.

  • Parameter scale. Grok 4.5 reportedly uses a 1.5T V9 foundation model per TestingCatalog. Anthropic does not publish Opus parameter counts, so a direct number comparison is unverified — no public number from one lab in this signal set.

  • Training data signal. Grok 4.5 reportedly includes Cursor data in supplemental training. Anthropic's Opus models do not have a publicly disclosed analogous IDE-trace ingest, though both labs presumably ingest large code corpora. The Cursor pipeline is the differentiator the community is anchoring on.

  • Distribution. Grok 4.5 is private-beta inside SpaceX and Tesla per the community quoting. Opus models ship to API, claude.ai, and partner platforms simultaneously. Grok 4.5 is at an earlier surface than any current Opus SKU.

  • Coding specialization. The Grok 4.5 thesis is explicitly agentic coding, driven by the Cursor data story. Opus competes broadly across reasoning, coding, and long-context tasks without a single dominant specialization framing.

The Cursor Training Loop framing gives Grok 4.5 a clean story to tell. It does not, on its own, demonstrate that the model clears Opus on any specific eval. Anyone planning a migration on the strength of the "close to or beyond Opus" line is migrating on a quoted sentence, not a measurement.

What We Know vs. What We Can't Yet Verify

Confirmed by the signal bundle:

  • A quoted line attributed to Musk says Grok 4.5 is in private beta at SpaceX and Tesla, surfaced by AshutoshShrivastava.
  • TestingCatalog reports Grok 4.5 is based on a 1.5T V9 foundation model with Cursor data in supplemental training.
  • Multiple community accounts, including Mark Kretschmann, describe a Cursor training collaboration with Grok models in the broader pipeline.
  • A Musk-attributed claim states Grok 4.5 performs close to or beyond Opus, with no formal benchmarks accompanying the quote.
  • Mark Kretschmann reports that Grok 5 is expected in two variants at 6T and 10T parameters, in the Fable 5 weight class.
  • Per Mark Kretschmann's June 28 post, the version number for the Grok 1.5T / Cursor Composer 3 model was removed from xAI menus on the morning of June 28, which he describes as a pattern that precedes xAI releases.
  • The only evidence-tagged tweet in this set is Kretschmann's screenshot showing the menu state.
  • xAI is described in Grok's own product page as offering Chat, Multi-agent, Search, and Imagine surfaces — the existing product surface area the Grok 4.5 model would slot into.

Open questions the bundle does not resolve:

  • No benchmark numbers have been published for Grok 4.5 — no SWE-bench, no MMLU, no coding-agent eval, nothing standardized.
  • The specific Opus version used as the comparison baseline is not stated.
  • The Cursor data license terms, volume, and exact mixing recipe are not disclosed.
  • No public timing has been given for either Grok 4.5 GA or an external API surface.
  • Pricing, context window, and per-token cost are unstated.
  • The relationship between Grok 4.5, Grok 5, and the reported 1.5T / 6T / 10T variant structure has not been laid out in any official xAI document.
  • It is unclear whether Cursor itself or external partners get access ahead of public release.
  • The Grok 1.5T / Cursor Composer 3 model and the Grok 4.5 referenced in the Musk quote may or may not be the same SKU under different labels — the bundle is ambiguous.

How To Evaluate It Yourself When Access Opens

When the model surfaces publicly, the signal-to-noise problem will get harder, not easier. The first 72 hours of any xAI release tend to be flooded with vibe-check posts and selective demos. A few practical evaluation moves keep you honest.

Run a coding-agent eval before trusting any agentic-coding claim. SWE-bench Verified is the obvious one, but pair it with a private internal task set the model has not seen — ideally something with multi-file edits and a real test harness. A model trained on Cursor traces will look exceptional on tasks that resemble Cursor traces and merely good on tasks that don't.

Probe long-context behavior at the actual advertised length, not just nominal capacity. Many models that advertise large contexts degrade sharply past a certain depth. The bundle does not state a context window for Grok 4.5, so this is a known-unknown to test first.

Compare cost per resolved task, not cost per token. The Cursor Training Loop story implies a possible efficiency edge on coding workflows. Cost per token is a misleading metric when one model resolves a multi-step task in two calls and another takes seven.

Watch for the official model card. If xAI publishes one with the launch, the gap between the card's claims and community vibe checks is itself a signal about positioning.

Why This Matters For Builders

The Grok 4.5 leak trail is interesting less because of the model and more because of the data pipeline it implies. Several frontier labs are converging on the same problem: producing models that are strong at agentic, multi-step coding tasks where the bottleneck is no longer raw reasoning but environmental fluency — knowing how an IDE behaves, how a codebase is structured, how a developer iterates.

Cursor's training data, if folded into xAI's pipeline at meaningful volume, is one of the cleanest known approaches to that problem. It is also the kind of partnership that, if it generalizes, changes the procurement question for builders. Choosing a coding model becomes partly a question of which dev-tool data exhaust it was trained on. That is a different framing than "which model is smartest."

For teams currently committed to Claude or GPT for agentic coding, the practical action this week is not to switch. It is to design an evaluation harness that can quickly answer the migration question when Grok 4.5 actually opens. Building that harness against a known baseline now is cheaper than building it under release-week pressure.

What To Watch Next

Three observation signals are worth pinning over the next two weeks.

Watch for the official model card. xAI tends to publish at least a partial spec sheet at public release, and the gap between the Musk-attributed Opus framing and any actual benchmark table will be the first real data point.

Run your own coding eval before relying on the Cursor-trained advantage. The structural argument is plausible. The measurement is not yet in the public record.

Pin the variant relationship. Whether Grok 4.5 at 1.5T, the Grok 1.5T / Cursor Composer 3 SKU, and the eventual Grok 5 6T / 10T variants are the same family or distinct lines materially changes the roadmap reading. The first xAI release note should clarify it.

Building similar agentic coding workflows on chat-class models? On kie.ai you can try Claude Opus 4.8, GPT-5.5, and Gemini 3 Pro.

#grok 4.5#grok 1.5T model#xai cursor composer 3#grok leak#grok private beta#grok 5 release#cursor training data
Maya Chen

About Maya Chen

Maya tracks AI model releases, benchmarks, and developer adoption signals across the open and closed model landscape.

View all posts by Maya Chen