“RT @cb_doge: NEWS: Grok dominates real-world voice AI benchmarks 🔥 τ-voice Bench 🥇 • 🏆 #1 Overall → 67.3% • 🏆 #1 Retail → 62.3% • 🏆 #1 Ai…”
The tweet archive.
15 years of Elon, fully searchable. The production archive uses Supabase as the source of truth, with 94,952 indexed tweets available in development as a full-archive fallback and a curated annotation layer for context, theory, and how major claims aged.
“RT @cb_doge: Grok continues to lead global benchmarks: • #1 in AA Omniscience (lowest hallucination rate) • #1 in IFBench performance • #1…”
“Grok [@cb_doge] Grok 4.20 Reasoning just took the #1 spot on the BridgeBench reasoning benchmark. 🔥 Beating GPT-5.4, Claude Opus 4.6, Google Gemini and others. Week after week, Grok keeps climbing across benchmarks. 🚀”
“RT @MarioNawfal: Grok 4.20 Beta dropped insane benchmark numbers: lowest hallucination rate ever recorded at 22%, number 1 in following in…”
“Grok upgrades [@XFreeze] The new Grok 4.20 Beta benchmarks are wild 🥇 #1 lowest hallucinating AI (22%) 🥇 #1 at following instructions (83%) 🥈 #2 in agentic tool use (97%) Grok 4.20 ranks #1 in the lowest hallucination rate ever recorded across all AI models tested globally Most models race to”
“And there will be noticeable improvements every week [@BrianRoemmele] WOW! I can not find any low points to @Grok 4.20 on all of my internal benchmarks. The most important is he fully is in first principles mode and uses @Grokipedia to a maximal extent. This has made the platform far more nuanced about all new and old technologies as well as”
“@Kochara13 @techdevnotes Actually, I don’t think HLE is a great measure of usefulness. We’re moving away from these benchmarks in favor of making Grok maximally useful for actual engineering.”
“Grok Imagine is improving super fast! What are the highest priority improvements you want? Please reply below. [@WesRoth] xAI has launched the Grok Imagine API, a powerful suite for video and audio generation that sets a new benchmark in speed, cost, and quality. Built for creators, developers, and enterprise workflows, it lets users generate cinematic videos from text or images, edit scenes with… https://x.com/i/web/status/2017168407725264905”
“True [@CMS_Flash] Future prediction: the ONLY benchmark that is theoretically completely unhackable. And only 1 model is making a positive return.”
“RT @EthanHe_42: LLMs can overfit benchmarks but not the stock market. The future isn’t hackable.”
“@beffjezos It is the most objective benchmark”
“Grok [@cb_doge] BREAKING: Grok just took the #1 spot in coding performance benchmark. Grok 4.1 Fast outscored both Google and OpenAI on code understanding and large file systems. Grok leads where it matters for developers.”
“Grok [@teslaownersSV] 🚀 BREAKING Grok 4.1 – Dominating the Frontier! 🚀 🥇 #1 on LMArena Text Arena (Thinking mode: 1483 Elo) 🥇 #1 in Emotional Intelligence (EQ-Bench v3) 🥇 #1 in Creative Writing (v3 benchmark) 🥇 #1 in Agentic Tool Use (τ²-Bench) The most human-like, reliable, and capable AI yet.… https://x.com/i/web/status/2002956548625084455”
“Grok [@ArtificialAnlys] xAI’s new Grok Voice Agent is the new leading Speech to Speech model, surpassing Gemini 2.5 Flash Native Audio and GPT Realtime in our Big Bench Audio benchmark The new model achieves a score of 92.3% on Big Bench Audio, just ahead of the previous leader, Google’s Gemini 2.5… https://x.com/i/web/status/2001388724987527353”
“Grok [@cb_doge] BREAKING: Grok Voice Agent ranks #1 in Speech Reasoning Benchmark.”
“RT @tetsuoai: Grok ranked #1 on CryptoBench. 💫 This platform is the first expert benchmark for agents in the crypto industry. It tests abi…”
“Not bad [@cb_doge] BREAKING: New Benchmarks just dropped and Grok 4.1 Fast is claiming the top spots with dominant performance across every category. Rankings | 03 Dec 2025 🥇”
“Not bad for now [@XFreeze] Grok-4 achieves 126 IQ, ranking #2....very close performance to newly released Gemini 3 Pro in TrackingAI benchmark Grok-4 launched over 4 months ago..still leading and already outperforms nearly every top model”
“RT @teslaownersSV: Grok Rankings Update Nov 23 Grok 4.1 Fast 🥇 #1 on 𝜏²-Bench Telecom (Agentic Tool Use) Benchmark 🥇 #1 on OpenRouter Tren…”
“RT @cb_doge: BREAKING: Grok 4.1 Fast is #1 in the research benchmarks. It gets the highest research score, strong Frames results and the be…”
“Grok [@MarioNawfal] 🚨GROK IS UNSTOPPABLE: NOVEMBER RANKINGS DROP Grok is clearing the board. The latest update shows total dominance across every major benchmark, Number 1 on BlackBox AI, Terminal-Bench Hard, GPQA Diamond, SciCode, AAII Token Usage, Roo Code, KiloCode, and Cline. On”
“RT @amXFreeze: Grok 4 still ranks #1 on the FinSearchComp Benchmark It's a first expert-level benchmark for financial search & reasoning…”
“Grok [@jay_azhang] Our new benchmark has the top 6 AI models trading real capital Grok4 is winning so far. It was short and then flipped to long, timing the bottom perfectly It's up >500% in 1 day”
“Progress [@cb_doge] Grok continues to lead the benchmarks. 🔥 #1 on Terminal-Bench Hard, GPQA Diamond, and SciCode, ahead of OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, and others.”
“Just Grok it [@cb_doge] 🚨 BREAKING: Grok continues to dominate AI benchmarks, beating OpenAI's ChatGPT, Google's Gemini and others in reasoning, coding, and agentic tasks. #1 in GPQA (Scientific Reasoning) #1 in SciCode (Coding) #1 in Terminal-Bench (Agentic Coding & Terminal Use)”
“Progress [@Prashant_1722] BREAKING 🚨 Grok 4 Fast Reasoning ranks no. 1 with a new record on the Extended NYT Connections Benchmark of 759 puzzles. - Grok 4 ranks no. 2, xAI dominance is incredible - beats OpenAI GPT-5, o3-pro medium reasoning, Google Gemini 2.5 Pro, DeepSeek and Qwen 3 - benchmark has”
“Cool [@amXFreeze] Grok 4 Ranks #1 on FinSearchComp Benchmark This is the first expert-level benchmark for financial search & reasoning Grok 4 is unbelievably approaching human experts level”
“I now think @xAI has a chance of reaching AGI with @Grok 5. Never thought that before. [@amXFreeze] Grok 4 just smashed the AGI benchmarks, achieving even higher score than its previous high with open program synthesis No other model even comes close and has not passed Grok 4 previous raw performance Currently Grok is more closer to AGI than any other AI models”
“Download the Grok App and try it out! [@liujiashuo77] We built FutureX, the world’s first live benchmark for real future prediction — politics, economy, culture, sports, etc. Among 23 AI agents, #Grok4 ranked #1 🏆 Elon didn’t lie. @elonmusk your model sees further 🚀🍀 LeaderBoard: https://futurex-ai.github.io/”
“The ability to predict the future is the best measure of intelligence [@amXFreeze] Grok 4 ranks #1 on the latest FutureX benchmark for real-world predictions surpassing GPT-5 Pro”
“Nice work by the Grok Code team! Join @xAI to take it to the next level. It’s a great vibe in the office. 🚀💫🦾 [@skcd42] Grok-code-fast-1 is now out and available for everyone to use 🚀🏎️💨 When I joined the coding team, the team was just 3 people and we very quickly built a model which was SOTA on SWEBench. But as things go, in the real world benchmarks matter less. Over the last few months we”
“RT @amXFreeze: Grok 4 achieves top accuracy scores in several benchmarks, it’s the most precise AI yet”
“Grok is the best at predicting the future, which is the best measure of intelligence imo [@amXFreeze] GROK 4 ranks #1 in the FutureX live benchmark for real-world future predictions”
“RT @tetsuoai: Grok 4 is dominating benchmarks as the planet's top model, and it's a threat to OAI. This is also why the rushed GPT-5 flop a…”
“RT @cb_doge: BREAKING: Grok 4 leads the IOI Benchmark with the highest accuracy, beating ChatGPT, Gemini, Claude, DeepSeek & more. 🔥 #1 in…”
“Great work [@Yuhu_ai_] Very proud of us @xai after seeing the GPT5 release. With a much smaller team, we are ahead in many. Grok4 world’s first unified model, and crushing GPT5 in benchmarks like ARC-AGI. @OpenAI is a very respectful competitor and still the leader in many, but we’re fast and”
“Bottom line though: Grok 4 Heavy was smarter 2 weeks ago than GPT5 is now and G4H is already a lot better. Let that sink in. [@BasedBeffJezos] Here's a comparison of GPT5 and Grok 4 GPT 5 with tools is between Grok 4 and Grok 4 Heavy in Humanity's Last Exam benchmark”
“🎯 [@cb_doge] ELON MUSK: "We are geared towards making 𝕏 the best platform for advertising in world, by applying the smartest AI, the highest intelligence on every benchmark, Grok. It allows us to understand how to match a product with consumer, with someone who would find that interesting."”
“RT @fchollet: Today we're releasing a developer preview of our next-gen benchmark, ARC-AGI-3. The goal of this preview, leading up to the…”
“RT @sehoonkim418: Grok4 is indeed a good model, ranking #1 on every major benchmark🔥 Blogpost: https://t.co/5JY56r3V0o https://x.ai/news/grok-4”
“True [@satyanadella] The real benchmark for AI progress is whether it makes a real difference in people’s lives — in healthcare, education, and productivity. Thanks to @ycombinator for having me at AI Startup School.”
“RT @veggie_eric: Grok 3 mini sounds cute, but it's the best reasoning model on the planet. It crushes flagship models on benchmarks, while…”
“Soon, AI will far exceed the best humans in reasoning [@MarioNawfal] GROK-3 MINI MADE AI HISTORY—100% ON HARDCORE REASONING TESTS Grok-3 Mini pulled off what no other model has! It aced every question on one of the toughest reasoning benchmarks out there. The test? A custom logic gauntlet packed with curveballs: * 120/120 on the “Marcus”
“Cool [@techdevnotes] Grok 3 Beta ranks #1 in CaseLaw Benchmark! Surpassing Gemini 2.5 Pro”
“Grok voice works great for telling children’s stories and answering their questions! [@_valsai] Grok 3 Beta dominates on our proprietary benchmarks, setting the new SOTA on our Finance, Legal and Tax benchmarks. Congrats @xai @grok @elonmusk 🚀🚀🚀 We just released the benchmark results for xAI's new models: Grok 3 Beta & Grok 3 Mini Fast Beta (High & Low Reasoning) –”
“RT @AdamGhetti: Grok-3-mini-fast really is chasing that efficient frontier. There are benchmarks and then there are results purely measured…”
“And Grok is improving faster [@bycloudai] the grok-3 benchmark is pretty useful in comparing base models, so I added GPT-4.5 )”
“🚀🚀 [@cb_doge] BREAKING: Grok 3 has defeated ChatGPT, Deepseek, and Gemini across multiple benchmarks. )”
“All you need to know to understand which company will win a technology competition is look at the first and second derivatives of the rate of innovation [@Scobleizer] Grok 3 benchmarks. The thing to really pay attention to in AI is learning speed. And @xai is learning way faster than any other. Who said that? Apple Siri cofounder Tom Gruber. He told me at dinner a decade ago that that is the most important thing to pay attention to. )”
“@flcnhvy @PPathole @cleantechnica We rewrote all labeling software for 4D. Very different from labeling single photos. Dojo won’t contribute for about a year. It’s mostly a generalized NN training computer, but benchmark we’re tracking is frames/second. Must beat next gen GPU/TPU clusters or it’s pointless.”
