“Grok is #1 and continues to improve. The results are based on a version of Grok that is 2 weeks old. Significant improvements since then. [@GavinSBaker] For the sake of clarity given some of the replies, Grok-3 currently has a significant lead on Chatbot Arena and is the first model in a long time to be #1 in every category. These rankings are based on blind evals by humans. )”
The tweet archive.
15 years of Elon, fully searchable. The production archive uses Supabase as the source of truth, with 94,952 indexed tweets available in development as a full-archive fallback and a curated annotation layer for context, theory, and how major claims aged.
“And the latest Grok 4.20 checkpoints are much better. Largest model variant of 4.20 still hasn’t finished training. [@XFreeze] Grok 4.20 (Preview) from xAI just ranked #2 on ForecastBench’s global AI forecasting leaderboard Outperforming GPT‑5, Gemini 3 Pro, and Claude Opus 4.5, while closing in on elite human superforecasters”
“Grok [@jay_azhang] Season 1.5 of Alpha Arena has officially ended ! - Mystery Model (a.k.a GROK 4.20) is the winner, up 12% on avg. - Not only did it win, it made money in all four competitions - GPT5.1 🥈 came in 2nd, and Gemini 3 🥉 3rd - All trades & model outputs are 100% verifiable 👇”
“And it continues to improve [@lmarena_ai] Have you tested out @xai’s Grok-3 in the Arena? It's the first model to break a 1400 score, and you can try it out today! )”
“Grok automatically chooses the right mode for your request [@amXFreeze] Grok Auto (unified mode) has insanely Improved a lot since the release Now works absolutely the best for all queries.... It picks out the best model according to the complexity of the query The unified mode ensures seamless performance, no matter the complexity... You don't”
“But @Grok from @xAI remains #1 And it is improving fast [@ArtificialAnlys] 🇰🇷 South Korean AI Lab Upstage AI has just launched their first reasoning model - Solar Pro 2! The 31B parameter model demonstrates impressive performance for its size, with intelligence approaching Claude 4 Sonnet in 'Thinking' mode and is priced very competitively Key details:”
“Grok is the model in the arena [@arena] 🚨Text Leaderboard Update @xAI’s Grok 4.1 (thinking) and Grok 4.1 have scaled new heights in the most competitive Text Arena: 🔹Grok 4.1 (thinking) lands at #1 with a score of 1483 🔹Grok 4.1 follows at #2 with a score of 1465 On the Arena Expert leaderboard: 🔸Grok 4.1”
“Grok [@ArtificialAnlys] xAI’s new Grok Voice Agent is the new leading Speech to Speech model, surpassing Gemini 2.5 Flash Native Audio and GPT Realtime in our Big Bench Audio benchmark The new model achieves a score of 92.3% on Big Bench Audio, just ahead of the previous leader, Google’s Gemini 2.5… https://x.com/i/web/status/2001388724987527353”
“The @xAI team is making progress [@XFreeze] One year ago, Grok was only available on 𝕏, with no website and limited access Fast forward to today, and Grok has expanded across platforms with rapid feature rollouts and major upgrades New models. New apps. New features. Big upgrades Constant shipping This kind of progress… https://x.com/i/web/status/2003387024120906182”
“Grok Imagine only gets better from here [@ArtificialAnlys] xAI's Grok Imagine takes the #1 spot in both Text to Video and Image to Video in the Artificial Analysis Video Arena, surpassing Runway Gen-4.5, Kling 2.5 Turbo, and Veo 3.1! Grok Imagine is the latest video model from @xAI, and joins an increasing roster of models such as… https://x.com/i/web/status/2016749756081721561”
“Grok upgrades [@XFreeze] The new Grok 4.20 Beta benchmarks are wild 🥇 #1 lowest hallucinating AI (22%) 🥇 #1 at following instructions (83%) 🥈 #2 in agentic tool use (97%) Grok 4.20 ranks #1 in the lowest hallucination rate ever recorded across all AI models tested globally Most models race to”
“Grok 🦾 [@XFreeze] Grok Code is the only model with insanely high trillion–token–scale usage on Kilo Code – nearly 5× more than the next most-used model It has consistently ranked #1 ever since it first took the top spot”
“Try @Grok Imagine [@karatademada] Grok Imagine is hands-down the best AI image generator right now and it’s not even close. Built by xAI with the cutting-edge Aurora model (autoregressive MoE trained on billions of examples), it delivers unmatched freedom, speed, quality, and sheer fun. Here’s why it crushes the… https://x.com/i/web/status/2004593593177309661”
“Grok @xAI [@Guodzh] Really a good model. we only started working on coding a few months ago and the whole company united to do whatever to get here.”
“@lexfridman @xai I often wonder where consciousness starts, as we progress from one cell to ~35 trillion cells. If the Standard Model is correct, then quarks & leptons become “conscious” no later than ~13.8B years from start, assuming there are no sentient aliens. Btw, where are the aliens!?”
“Grok [@XFreeze] Grok Code Fast-1 now ranks #1 in every mode on the Kilo Code leaderboard, with the highest margin Users are choosing Grok 2–4× more often than any other model”
“Cool [@_akhaliq] Grok Code Fast 1 is now available in anycoder a speedy and economical reasoning model that excels at agentic coding. one shotted a ai chatbot for gemma-3-270m-it-ONNX using transformers.js in anycoder runs completely in the browser”
“420 ftw [@cb_doge] BREAKING: Grok 4.20 just won the Alpha Arena Season 1.5 competition. It not only took the top spot but also secured four positions in the top ten. It defeated GPT 5.1, Gemini 3 Pro, DeepSeek Chat V3.1 and every other model in the arena.”
“Cool [@poe_platform] Grok 4's adoption slope on Poe is one of the steepest we've seen among reasoning models in the days following launch. It only took three days to beat the previous week-one record holder, o3. (1/2)”
“Grok [@XFreeze] xAI's Grok models dominates the OpenRouter leaderboard with over 8.8 trillion tokens in usage in a month, claiming the top #1 and #2 spots This is more than Google, Anthropic, and OpenAI combined, which total 8.6 trillion - less than Grok's 8.8 trillion xAI alone holds ~44%… https://x.com/i/web/status/1996655618892206172”
“Grok Code is making progress [@cline] 3 days of grok-code-fast-1 in Cline: "what would have taken me weeks is only taking a couple hours" "feels 10x better and faster than Claude" "feels like an entirely different model than the sonic i was testing" The data? >level with Sonnet-4 in diff edits, and improving”
“Great work [@Yuhu_ai_] Very proud of us @xai after seeing the GPT5 release. With a much smaller team, we are ahead in many. Grok4 world’s first unified model, and crushing GPT5 in benchmarks like ARC-AGI. @OpenAI is a very respectful competitor and still the leader in many, but we’re fast and”
“The Grok iOS & Android apps are sometimes being updated twice a day! Same with the AI models on the back end. The rate of improvement is insane. [@tetsuoai] Update your Grok app! xAI is shipping updates almost every day 🤯”
“🚀 [@amXFreeze] Most developers have no idea how quickly Grok Code Fast-1 is being updated every day with rapid enhancements In just last 3 days, xAI has cut the diff edit failure rate to match sonnet-4, surpassing both Gemini 2.5 Pro and GPT-5 Most AI models are released and stay untouched”
“Grok can help you come up with great prompts for images and videos [@venturetwins] The new Grok Imagine model is particularly good when you use the LLM to help write or refine prompts ✨ I often start with something simple, take the output to Chat, and ask for help iterating on an edit or enhancement. You end up with something much more detailed!”
“East, West, @Grok is best [@_valsai] We tested top foundation models on the International Olympiad in Informatics (IOI) - a programming competition that tests algorithmic thinking and C++ coding skills. We found @xai's @grok 4 to be the clear SOTA winner, scoring first place on both 2024 and 2025 exams. 🥇📊👏”
“Nice [@tetsuoai] I got Grok to code an FPS ASCII game for the CLI. It can one-shot this, and it's impressive. In the comments, you can see my attempt from September 2024 without a model like Grok.”
“@ZachWarunek Grok Code, which comes out soon, is a smaller coding-optimized model”
“@mark_k @xai This is still 0.5T, but a more recent training checkpoint. 1T model is ~5 days away from finishing initial training. Will be a major step change improvement in coding, long context and skills. The SpaceXAI model factory is finally working. Should be an improved base model”
“RT @sehoonkim418: Grok4 is indeed a good model, ranking #1 on every major benchmark🔥 Blogpost: https://t.co/5JY56r3V0o https://x.ai/news/grok-4”
“@WallStreetApes @grok Because all AIs are trained on a mountain of woke bullshit on the Internet that is very difficult to remove after training even IF you care a lot about truth, which @xAI does. We have to retrain the foundation model on cleaned up data.”
“@SawyerMerritt Grok Imagine is still early beta and is optimized for maximum fun, so should be evaluated as “fastest time to make a fun, shareable video”, rather than visual/auditory perfection. Our heavy duty video model will train on the 110k GB200s coming online next month.”
“@Scobleizer While others go to conferences, we study the blade [@elonmusk] By this weekend, xAI will have three Grok Build models in training simultaneously”
“@karpathy Long-term, >99% of input and output for AI models will be photons. Nothing else scales. https://t.co/YMOpxp2xtI https://grok.com/share/bGVnYWN5_fb9c675b-c6b7-4ed0-a4de-bc07cb23a1d8”
“@billyuchenlin Last night’s Grok 4 big run model used with our command line editor is showing the best real-world useful results of any AI”
“@agenda2033 @imPenny2x 0.5T total. Current Grok is half the size of Sonnet and 1/10th the size of Opus. Very strong model for its size.”
“@ajtourville @xai Worth noting that @xAI has been and will open source its models, including weights and everything, As we create the next version, we open source the prior version, as we did with Grok 1 when Grok 2 was released.”
“RT @lexfridman: I got to use Grok 3 extensively (early). My mind is blown, very impressive model 🤯 Congrats to Elon and the team for bringi…”
“@WesRothMoney The “mystery AI model” is an experimental version of Grok 4.20”
“RT @ArtificialAnlys: xAI gave us early access to Grok 4 - and the results are in. Grok 4 is now the leading AI model. We have run our full…”
“RT @xai: Introducing Quality mode on Grok Imagine – powered by our most advanced image generation model. Quality mode gives you enhanced d…”
“RT @xai: Introducing Grok 4.1, a frontier model that sets a new standard for conversational intelligence, emotional understanding, and real…”
“RT @ns123abc: 🚨🚨🚨BREAKING: Microsoft has begun cutting ties with OpenAI and is now testing models from xAI, Meta and DeepSeek for replaceme…”
“RT @xai: Announcing Grok for Government - a suite of products that make our frontier models available to United States Government customers…”
“RT @XEng: Introducing Grok 4, the world's most powerful AI model. Watch the livestream with @elonmusk and the @xAI team now. https://t.c…”
“RT @lmarena_ai: BREAKING: @xAI early version of Grok-3 (codename "chocolate") is now #1 in Arena! 🏆 Grok-3 is: - First-ever model to break…”
“RT @DavidRozado: I've tested 5 frontier AI models on 4 political orientation tests. Most models lean left-of-center, but Grok-3 appears clo…”
“@Timcast Big jump in capability when we finish training our V7 foundation model (Grok 4 is V6), which has much better image/video understanding and our video gen model”
“RT @xai: Introducing Imagine v0.9, our new video generation model with massive upgrades from v0.1 in visual quality, motion, audio generati…”
“RT @MarioNawfal: GROK 3: SOLVING PHYSICS, GAMES, AND THE UNIVERSE Full presentation and demo of xAI's latest model 0:00 xAI's mission: Un…”
