“Cool [@techdevnotes] Grok 3 Beta ranks #1 in CaseLaw Benchmark! Surpassing Gemini 2.5 Pro”
The tweet archive.
15 years of Elon, fully searchable. The production archive uses Supabase as the source of truth, with 94,952 indexed tweets available in development as a full-archive fallback and a curated annotation layer for context, theory, and how major claims aged.
“Soon, AI will far exceed the best humans in reasoning [@MarioNawfal] GROK-3 MINI MADE AI HISTORY—100% ON HARDCORE REASONING TESTS Grok-3 Mini pulled off what no other model has! It aced every question on one of the toughest reasoning benchmarks out there. The test? A custom logic gauntlet packed with curveballs: * 120/120 on the “Marcus”
“The ability to predict the future is the best measure of intelligence [@amXFreeze] Grok 4 ranks #1 on the latest FutureX benchmark for real-world predictions surpassing GPT-5 Pro”
“I now think @xAI has a chance of reaching AGI with @Grok 5. Never thought that before. [@amXFreeze] Grok 4 just smashed the AGI benchmarks, achieving even higher score than its previous high with open program synthesis No other model even comes close and has not passed Grok 4 previous raw performance Currently Grok is more closer to AGI than any other AI models”
“And there will be noticeable improvements every week [@BrianRoemmele] WOW! I can not find any low points to @Grok 4.20 on all of my internal benchmarks. The most important is he fully is in first principles mode and uses @Grokipedia to a maximal extent. This has made the platform far more nuanced about all new and old technologies as well as”
“Grok [@jay_azhang] Our new benchmark has the top 6 AI models trading real capital Grok4 is winning so far. It was short and then flipped to long, timing the bottom perfectly It's up >500% in 1 day”
“Grok voice works great for telling children’s stories and answering their questions! [@_valsai] Grok 3 Beta dominates on our proprietary benchmarks, setting the new SOTA on our Finance, Legal and Tax benchmarks. Congrats @xai @grok @elonmusk 🚀🚀🚀 We just released the benchmark results for xAI's new models: Grok 3 Beta & Grok 3 Mini Fast Beta (High & Low Reasoning) –”
“Cool [@amXFreeze] Grok 4 Ranks #1 on FinSearchComp Benchmark This is the first expert-level benchmark for financial search & reasoning Grok 4 is unbelievably approaching human experts level”
“Not bad for now [@XFreeze] Grok-4 achieves 126 IQ, ranking #2....very close performance to newly released Gemini 3 Pro in TrackingAI benchmark Grok-4 launched over 4 months ago..still leading and already outperforms nearly every top model”
“All you need to know to understand which company will win a technology competition is look at the first and second derivatives of the rate of innovation [@Scobleizer] Grok 3 benchmarks. The thing to really pay attention to in AI is learning speed. And @xai is learning way faster than any other. Who said that? Apple Siri cofounder Tom Gruber. He told me at dinner a decade ago that that is the most important thing to pay attention to. )”
“And Grok is improving faster [@bycloudai] the grok-3 benchmark is pretty useful in comparing base models, so I added GPT-4.5 )”
“Bottom line though: Grok 4 Heavy was smarter 2 weeks ago than GPT5 is now and G4H is already a lot better. Let that sink in. [@BasedBeffJezos] Here's a comparison of GPT5 and Grok 4 GPT 5 with tools is between Grok 4 and Grok 4 Heavy in Humanity's Last Exam benchmark”
“Progress [@cb_doge] Grok continues to lead the benchmarks. 🔥 #1 on Terminal-Bench Hard, GPQA Diamond, and SciCode, ahead of OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, and others.”
“Just Grok it [@cb_doge] 🚨 BREAKING: Grok continues to dominate AI benchmarks, beating OpenAI's ChatGPT, Google's Gemini and others in reasoning, coding, and agentic tasks. #1 in GPQA (Scientific Reasoning) #1 in SciCode (Coding) #1 in Terminal-Bench (Agentic Coding & Terminal Use)”
“Grok [@teslaownersSV] 🚀 BREAKING Grok 4.1 – Dominating the Frontier! 🚀 🥇 #1 on LMArena Text Arena (Thinking mode: 1483 Elo) 🥇 #1 in Emotional Intelligence (EQ-Bench v3) 🥇 #1 in Creative Writing (v3 benchmark) 🥇 #1 in Agentic Tool Use (τ²-Bench) The most human-like, reliable, and capable AI yet.… https://x.com/i/web/status/2002956548625084455”
“Grok [@cb_doge] BREAKING: Grok Voice Agent ranks #1 in Speech Reasoning Benchmark.”
“Grok is the best at predicting the future, which is the best measure of intelligence imo [@amXFreeze] GROK 4 ranks #1 in the FutureX live benchmark for real-world future predictions”
“Not bad [@cb_doge] BREAKING: New Benchmarks just dropped and Grok 4.1 Fast is claiming the top spots with dominant performance across every category. Rankings | 03 Dec 2025 🥇”
“🎯 [@cb_doge] ELON MUSK: "We are geared towards making 𝕏 the best platform for advertising in world, by applying the smartest AI, the highest intelligence on every benchmark, Grok. It allows us to understand how to match a product with consumer, with someone who would find that interesting."”
“Nice work by the Grok Code team! Join @xAI to take it to the next level. It’s a great vibe in the office. 🚀💫🦾 [@skcd42] Grok-code-fast-1 is now out and available for everyone to use 🚀🏎️💨 When I joined the coding team, the team was just 3 people and we very quickly built a model which was SOTA on SWEBench. But as things go, in the real world benchmarks matter less. Over the last few months we”
“Progress [@Prashant_1722] BREAKING 🚨 Grok 4 Fast Reasoning ranks no. 1 with a new record on the Extended NYT Connections Benchmark of 759 puzzles. - Grok 4 ranks no. 2, xAI dominance is incredible - beats OpenAI GPT-5, o3-pro medium reasoning, Google Gemini 2.5 Pro, DeepSeek and Qwen 3 - benchmark has”
“Grok [@MarioNawfal] 🚨GROK IS UNSTOPPABLE: NOVEMBER RANKINGS DROP Grok is clearing the board. The latest update shows total dominance across every major benchmark, Number 1 on BlackBox AI, Terminal-Bench Hard, GPQA Diamond, SciCode, AAII Token Usage, Roo Code, KiloCode, and Cline. On”
“Grok [@cb_doge] Grok 4.20 Reasoning just took the #1 spot on the BridgeBench reasoning benchmark. 🔥 Beating GPT-5.4, Claude Opus 4.6, Google Gemini and others. Week after week, Grok keeps climbing across benchmarks. 🚀”
“🚀🚀 [@cb_doge] BREAKING: Grok 3 has defeated ChatGPT, Deepseek, and Gemini across multiple benchmarks. )”
“Download the Grok App and try it out! [@liujiashuo77] We built FutureX, the world’s first live benchmark for real future prediction — politics, economy, culture, sports, etc. Among 23 AI agents, #Grok4 ranked #1 🏆 Elon didn’t lie. @elonmusk your model sees further 🚀🍀 LeaderBoard: https://futurex-ai.github.io/”
“Grok [@cb_doge] BREAKING: Grok just took the #1 spot in coding performance benchmark. Grok 4.1 Fast outscored both Google and OpenAI on code understanding and large file systems. Grok leads where it matters for developers.”
“Grok [@ArtificialAnlys] xAI’s new Grok Voice Agent is the new leading Speech to Speech model, surpassing Gemini 2.5 Flash Native Audio and GPT Realtime in our Big Bench Audio benchmark The new model achieves a score of 92.3% on Big Bench Audio, just ahead of the previous leader, Google’s Gemini 2.5… https://x.com/i/web/status/2001388724987527353”
“Grok Imagine is improving super fast! What are the highest priority improvements you want? Please reply below. [@WesRoth] xAI has launched the Grok Imagine API, a powerful suite for video and audio generation that sets a new benchmark in speed, cost, and quality. Built for creators, developers, and enterprise workflows, it lets users generate cinematic videos from text or images, edit scenes with… https://x.com/i/web/status/2017168407725264905”
“Grok upgrades [@XFreeze] The new Grok 4.20 Beta benchmarks are wild 🥇 #1 lowest hallucinating AI (22%) 🥇 #1 at following instructions (83%) 🥈 #2 in agentic tool use (97%) Grok 4.20 ranks #1 in the lowest hallucination rate ever recorded across all AI models tested globally Most models race to”
“Great work [@Yuhu_ai_] Very proud of us @xai after seeing the GPT5 release. With a much smaller team, we are ahead in many. Grok4 world’s first unified model, and crushing GPT5 in benchmarks like ARC-AGI. @OpenAI is a very respectful competitor and still the leader in many, but we’re fast and”
“RT @sehoonkim418: Grok4 is indeed a good model, ranking #1 on every major benchmark🔥 Blogpost: https://t.co/5JY56r3V0o https://x.ai/news/grok-4”
“RT @veggie_eric: Grok 3 mini sounds cute, but it's the best reasoning model on the planet. It crushes flagship models on benchmarks, while…”
“RT @AdamGhetti: Grok-3-mini-fast really is chasing that efficient frontier. There are benchmarks and then there are results purely measured…”
“RT @cb_doge: BREAKING: Grok 4.1 Fast is #1 in the research benchmarks. It gets the highest research score, strong Frames results and the be…”
“RT @cb_doge: BREAKING: Grok 4 leads the IOI Benchmark with the highest accuracy, beating ChatGPT, Gemini, Claude, DeepSeek & more. 🔥 #1 in…”
“RT @teslaownersSV: Grok Rankings Update Nov 23 Grok 4.1 Fast 🥇 #1 on 𝜏²-Bench Telecom (Agentic Tool Use) Benchmark 🥇 #1 on OpenRouter Tren…”
“RT @amXFreeze: Grok 4 still ranks #1 on the FinSearchComp Benchmark It's a first expert-level benchmark for financial search & reasoning…”
“RT @tetsuoai: Grok 4 is dominating benchmarks as the planet's top model, and it's a threat to OAI. This is also why the rushed GPT-5 flop a…”
“RT @cb_doge: Grok continues to lead global benchmarks: • #1 in AA Omniscience (lowest hallucination rate) • #1 in IFBench performance • #1…”
“RT @cb_doge: NEWS: Grok dominates real-world voice AI benchmarks 🔥 τ-voice Bench 🥇 • 🏆 #1 Overall → 67.3% • 🏆 #1 Retail → 62.3% • 🏆 #1 Ai…”
“@Kochara13 @techdevnotes Actually, I don’t think HLE is a great measure of usefulness. We’re moving away from these benchmarks in favor of making Grok maximally useful for actual engineering.”
“RT @MarioNawfal: Grok 4.20 Beta dropped insane benchmark numbers: lowest hallucination rate ever recorded at 22%, number 1 in following in…”
“RT @amXFreeze: Grok 4 achieves top accuracy scores in several benchmarks, it’s the most precise AI yet”
“RT @tetsuoai: Grok ranked #1 on CryptoBench. 💫 This platform is the first expert benchmark for agents in the crypto industry. It tests abi…”
“Japan’s population is now dropping by almost a million people per year https://t.co/GMmP8RqEnW https://x.com/i/grok/share/dOR9P0YdGNWE67qTAPUDm1S7J”
“Merry Christmas and may you have a wonderful New Year! 🎁🎄🎅 https://t.co/BNg7Mk5Dnd https://x.com/i/grok/share/NNQmfsCvQpTVGaH2dFQOCqJRA”
“Grok “Generate an image as if Voldemort and a Sith Lord had a baby and he became a judge in Brazil” It’s uncanny! 🤣🤣 https://t.co/aTdVRg9jrw”
“I asked Grok to generate an image of The View 😂 https://t.co/UIX2L47Sgv”
“When they said “DC swamp creatures”, I thought it was just a metaphor 😳 https://t.co/9Qmn92BGkO https://x.com/i/grok/share/N7XvCOmj8PFY1QUceIspeUcyE”
“Grok 4 is the first time, in my experience, that an AI has been able to solve difficult, real-world engineering questions where the answers cannot be found anywhere on the Internet or in books. And it will get much better.”
