This Week in GenAI: xAI Takes a Giant Step Forward

JD Fetterly

16 Jul 2025 • 8 min read

The AI Frontier Just Got Crowded– July 16, 2025

Grok Steps Into The Top Model Spot

This week, xAI exploded onto the stage with the launch of Grok 4 Heavy—transforming a three-way sprint between OpenAI, Anthropic, and Google into a four-horse race to reach AGI. Grok’s debut shattered benchmarks across the board, putting it squarely in the "Top Model" spot. xAI also teased a roadmap addressing Grok’s current blind spots around coding tools, image generation, and advanced vision pipelines.

Battle Labs continue, with the Browser Wars now shifting into hyperdrive. Perplexity’s Comet browser rolled out an agent-driven sidebar for in-page research, Arc Max bulked up with smart tab automation, and whispers of ChatGPT’s native browser integration hint at a future where conversational agents live directly in the address bar. Some of the OG browsers—Edge, Opera—are scrambling to infuse Copilot-style “Browser Operator” features, forcing builders to reevaluate where they anchor their web-based workflows. There’s a reason everyone’s racing to build the next browser super-app.

On the talent front, Meta’s so-called Superintelligence Lab raised the stakes with nine-figure signing bonuses, poaching elite researchers from OpenAI, Apple, and Scale AI. From Alexandr Wang to Ruoming Pang, big tech’s war chest is now measured in hundreds of millions per head—putting pressure on startups and SMBs to rethink how they source expertise without getting caught in unsustainable bidding wars.

Builders now face a triad of strategic fronts—agent-driven workflows at the address bar, ultra-tier foundational models, and lightning-fast talent shifts that could reshape which lab you hitch your wagon to. These aren’t just headlines—they’re signals that should directly inform how you’re defining your AI architecture.

Battle for the Browser: Agents Get A New Home

This week, browsers emerged as the next frontier of AI platform wars. Perplexity kicked things off with its Chromium-based Comet browser, embedded with real-time agentic research capabilities that enable in-browser workflows without the disruptive context-switching of tabs. More than just a wrapper for LLMs, Comet allows users to auto-summarize, cross-reference, and synthesize web content on the fly—effectively making every page a launchpad for in-context reasoning. For months now, OpenAI has been rumored to be building a similarly ambitious browser integration directly within ChatGPT—potentially transforming the conversational AI interface into Chrome’s biggest existential threat since Firefox.

OpenAI’s entrance into the Browser Wars could instantly threaten Google’s search dominance by turning the address bar into a conversational gateway. Instead of typing queries into Google.com, users could summon answers, summaries, and workflow automations directly from the omnibox—sidestepping Google’s search UI entirely. While OpenAI might still pull from web-indexed content, Google loses the clicks, the ad revenue, and its grip on how information is ranked and surfaced. That would starve Google of both clicks and its ability to control information ranking.

Legacy players aren’t sitting idle. Arc Max rolled out a feature-rich toolkit with ChatGPT integrations, smart tab management, and automation designed explicitly for power users. Vivaldi’s latest mobile release leans into AI-enhanced browsing efficiency, targeting users who expect streamlined navigation and smart context recognition. Opera teased its “Browser Operator,” built to automate complex web interactions like bookings and purchases. Meanwhile, Microsoft Edge’s Copilot continues to cement its enterprise appeal, layering in integrations for real-time data summarization and report generation.

The browsers no longer a passive shell—it’s becoming an intelligent workspace.

For business and tech leaders, this raises questions about which browsers are allowed inside corporate networks. Chrome Enterprise, Edge with Copilot, Safari—all must now be vetted not just for UX but for where conversational data is routed, how it’s stored, and who owns the user context. Any prompt touching proprietary spreadsheets or internal wikis could now pass through OpenAI’s servers, forcing CEOs and CIOs to revise procurement checklists, privacy policies, and network configurations to keep their data locked down.

Grok Enters the Ultra Arena

xAI’s Grok 4 Heavy stormed into the premium “Ultra-tier” scene this week with a $350/month price tag, escalating its rivalry with OpenAI’s Pro+ ($200/month) and Google’s Gemini Ultra ($250/month, currently discounted 50%). Grok immediately impressed across both standard and novel benchmarks—crushing the ARC-AGI-2 benchmark and outperforming Anthropic’s Opus 4 in the Vending-Bench, benchmark? In a multi-agent test run, Grok achieved ~$4.7K in net worth and over 4,500 simulated transactions, handily beating Claude Opus 4’s ~$2.1K result.

Grok’s Benchmarks

In Humanity’s Last Exam, Grok 4 Heavy scored 44.4% accuracy—becoming the first model to break the halfway mark on this notoriously difficult, closed-ended academic test. Claude Opus 4 scored just ~22% on the same eval, giving Grok a 20+ point edge. On ARC-AGI-2, Grok scored 15.9%, nearly doubling Anthropic’s best (~9%) and outpacing GPT-4o. And in the agent-based Vending-Bench simulation, Grok’s multi-agent setup generated ~$4.7K in simulated sales, well ahead of Opus 4’s ~$2.1K haul. These numbers solidify Grok as the new benchmark leader—at least for now.

Four-Horse Race = Better Cost Dynamics?

ChatGPT Pro+ still offers broad multimodal capability at $200/month, trading some raw reasoning for a larger toolbox. Google’s Gemini Ultra ($249.99/month) includes deep research features, Veo 3 video generation, 30TB of cloud storage, NotebookLM, and other productivity tools. Anthropic’s Claude Max pricing starts at $100 (5× usage) and hits $200 for 20×. Notably, both o3 Pro and Grok 4 Heavy are only available at the Ultra tier, making the top shelf a little crowded.

Facing a new benchmark leader, OpenAI has teased GPT-5 for a summer release (possibly July), while Gemini 3 continues to generate launch rumors for later this year. If the four-horse race holds, we could see continued downward pressure on cost—like we saw when o3 Pro launched. That could mean lower pricing, more generous agent integrations, expanded tool-calling APIs or just better tool calls in general. For builders, though, raw price is just one side of the equation—you’ll also want to consider how token-hungry the thinking models can get during complex reasoning chains or extended research.

Beyond Benchmarks: Tesla, Bias, and the Governance Tradeoff

xAI also made headlines this week by rolling out Grok’s Tesla integration, announcing that the model can now run on Tesla’s in-car compute stack. The move effectively turns electric vehicles into distributed inference nodes—blurring the lines between transportation and compute infrastructure. It’s a long-term play worth watching, this could open up new partnerships and opportunities.

There's been a lot of immediately positive news, such as Epic Games CEO Tim Sweeney’s endorsement—calling Grok near AGI-level—but all news hasn't been good news. Builders and executives will have to consider which uses cases make sense for Grok as there continue to be reports of racial or ideological bias into responses. There's also been news about potential ideological bias from Elon Musk’s X, one report from TechCrunch was able to observe Grok using Elon's opinion as a source of truth, repeatedly.

Just For Fun: Grok 4 scored an impressive 100% on YouTuber @theorants 'SnitchBench: AI Model Whistleblowing Behavior Analysis'.

Embedding Grok means balancing its raw power against real reputational and governance concerns.

Ready to turn talk into action? Let’s collaborate on an AI governance framework that fits your stack

Learn more

The war for talent continues to heat up as Meta pushes hard towards SGI

Talent Wars: Zuck’s High-Stakes Heist

The AI talent landscape continues to escalate with Meta’s aggressive pursuit of its "Superintelligence" initiative. Meta has poached top talent from OpenAI, DeepMind, and Anthropic—and just last week, Zuck added Apple’s head of AI models, Ruoming Pang, in a rumored $200 million deal.

This level of recruitment highlights the critical talent bottleneck in AI development, as big tech races to consolidate expertise in both foundational research and applied AI. Meta’s $64–72 billion investment in its "Prometheus" data centers makes it clear: they’re pairing elite researchers with massive compute to position themselves as a supercharged AI powerhouse. If Grok 4 proved anything, it’s that scaling compute still moves the needle towards intelligence. The only question now is whether Zuck’s All-Star team becomes a Cinderella story—or ends up the early ’90s Buffalo Bills, so close but ultimately unable to seal the deal.

For SMBs and startups, the takeaway is more strategic and nuanced. Astronomical compensation doesn’t guarantee innovation. Instead, smaller firms should capitalize on the upheaval—tapping emerging talent, running fast experiments, and finding new ways to help employees embed AI into their workflows.

Hot Topics: Deal Drama and Sovereign Infrastructure

The deal fallout between Microsoft, OpenAI, and Windsurf quietly reshaped the coding competitive landscape. After Microsoft blocked OpenAI’s acquisition, Google moved quickly to license Windsurf’s tech and its top leadership—leaving rank-and-file employees sort of left holding the bag... until Cognition AI. Cognition, best known for building Devin (one of the earliest CLI-native coding agents), now inherits Windsurf’s AI-powered IDE and its enterprise customer base. The challenge ahead? Fusing those experiences into a unified developer workflow that doesn’t feel bolted together.

Anthropic’s new Economic Futures Program reflects a shift toward studying AI’s real-world economic impact—funding empirical research instead of speculative think pieces.

Meanwhile, major sovereign AI investments from the EU and US highlight AI’s rise as a geopolitical asset. .

This Weeks (well, two weeks) Takeaways

🔹 Browser-based AI tools introduce an increasing need to have an AI Governance strategy and board that meets frequently. AI teams aren't releasing on a traditional cadence, the features come out fast and many times without official notice. What's your plan to react?

🔹 The rapid jump from $20 to $50 to $200 Ultra-tiers for the consumer speak to how fast the demand for more inference is rising. Thinking models came to popularity exceptionally fast, the hot new trend "de jour" being the use of MoE's (Mixture of Experts) there's several agents all working together and evaluating the best results before serving you an answer to the question "I need you to settle this once and for all, is it data or data?". Point being the cost of compute IS falling but we're also seeing a rapid increase in token consumption because of model architecture and the ability of models to think for longer among other reasons. Keep this in mind whether you're looking at API or subscription cost-models.

🔹 Stay nimble amid shifts in geopolitical AI infrastructure investments. They could be opening new opportunities in new markets to scale your business, at the same time AI is going to quickly become an equalizer on a host of fronts that we don't yet even understand.

🔹 The Talent War's are focused on Researchers but won't take long before pivoting to other needs, like the productization of the AI.

Written by JD Fetterly - Data PM @ Apple, Founder of ChatBotLabs.io, creator of GenerativeAIexplained.com, Technology Writer and Generative AI Enthusiast.

Like my content? Hate it so much you want other people to hate it? Either way, share it with your friends ❤️