News

Meta’s Llama 4 Models Struggle to Impress as Rivals Race Ahead in AI Arms Race

A Lukewarm Reception for Llama 4

Meta’s Llama 4 models highly anticipated have entered the AI landscape with far less impact than their predecessors, leaving developers and industry observers questioning their relevance. Despite launching two open-weight models—Llama 4 Scout and Llama 4 Maverick—alongside a preview of the much larger Llama 4 Behemoth, Meta’s latest offerings have failed to capture the imagination of the AI community.

At LlamaCon, Meta’s first dedicated conference for its open-source large language models held last month, developers voiced disappointment. Many had expected a major leap forward in reasoning capabilities or at least a competitive alternative to emerging leaders like DeepSeek’s V3 and Alibaba’s Qwen. “It would be exciting if they were beating Qwen and DeepSeek,” said Vineeth Sai Varikuntla, a developer focused on medical AI. “Qwen is ahead—way ahead—on general use and reasoning.”

Though Meta maintains that its models achieve state-of-the-art performance, reports indicate otherwise. The Wall Street Journal recently revealed delays in the release of Llama 4 Behemoth and highlighted struggles across the Meta’s Llama 4 models suite. Developers criticized the lack of a model that outperforms rivals, raising fresh doubts about Meta’s ability to keep up in a rapidly evolving AI race.

Meta’s Slipping Status in the AI Hierarchy

Once hailed as a major player in open-source AI, Meta now appears to be losing momentum. In 2023, Llama 2 was lauded by Nvidia CEO Jensen Huang as one of the most important AI developments of the year. Llama 3’s release in mid-2024 marked a significant leap, triggering a spike in demand for computing resources and pushing Meta briefly to the forefront of open model innovation.

However, Meta’s Llama 4 models has failed to continue that trajectory. Although the models incorporate new architectures like “mixture of experts”—an innovation previously popularized by DeepSeek—developers quickly identified discrepancies between benchmarked and publicly released versions of the models. This led to accusations that Meta was manipulating leaderboard rankings, a charge the company denies. According to Meta, the benchmarked version was experimental, and evaluating multiple model variants is common in the industry.

Despite those defenses, platforms like Artificial Analysis and OpenRouter show that Llama 4 Scout and Maverick are not among the most widely used or top-ranked models. Qwen and other competitors like Grok and Claude have pulled ahead in both usage and performance, cementing their leadership while Meta appears to lose direction.

A Divide Between Enterprise Strategy and Developer Expectations

Meta’s recent moves suggest a pivot toward enterprise adoption, but the technical community, once its strongest ally, seems unconvinced. Critics argue that while Meta continues to emphasize openness and innovation, it has failed to deliver significant performance gains in its newest models.

“It did seem like a bit of a marketing push for Llama,” said Mahesh Sathiamoorthy, cofounder of Bespoke Labs. The sentiment echoes broader concerns that Meta may be prioritizing optics over innovation. While the company claims to be listening to developer feedback, the absence of standout features in Meta’s Llama 4 models has led many to question whether Meta can still compete at the cutting edge of AI development.

As rivals like OpenAI, DeepSeek, and Alibaba accelerate their advancements, Meta’s once-groundbreaking Llama series risks being left behind.