Meta’s latest open-source AI models are a shot across the bow to the more expensive closed models from OpenAI, Google, Anthropic and others.
But it’s good news for businesses because they could potentially lower the cost of deploying artificial intelligence (AI), according to experts.
The social media giant has released two models from its Llama family of models: Llama 4 Scout and Llama 4 Maverick. They are Meta’s first natively multimodal models — meaning they were built from the ground up to handle text and images; these capabilities were not bolted on.
Llama 4 Scout’s unique proposition: It has a context window of up to 10 million tokens, which translates to around 7.5 million words. The record holder to date is Google’s Gemini 2.5 — at 1 million and going to 2.
The bigger the context window — the area where users enter the prompt — the more data and documents one can upload to the AI chatbot.
Ilia Badeev, head of data science at Trevolution Group, told PYMNTS that his team was still marveling at Gemini 2.5’s 1 million context window when Llama 4 Scout comes along with 10 million.
“This is an enormous number. With 17 billion active parameters, we get a ‘mini’ level model (super-fast and super-cheap) but with an astonishingly large context. And as we know, context is king,” Badeev said. “With enough context, Llama 4 Scout’s performance on specific applied tasks could be significantly better than many state-of-the-art models.”
Read more: Meta Adds ‘Multimodal’ Models to Its Llama AI Stable
Only 1 Nvidia H100 Host NeededBoth Llama 4 Scout and Maverick have 17 billion active parameters, meaning the number of settings that are activated at one time. In total, however, Scout has 109 billion parameters and Maverick has 400 billion.
Meta also said Llama 4 Maverick is cheaper to run: between 19 and 49 cents per million tokens for input (query) and output (response); it runs on one Nvidia H100 DGX server.
The pricing compares with $4.38 for OpenAI’s GPT-4o. Gemini 2.0 Flash costs 17 cents per million tokens while DeepSeek v3.1 costs 48 cents. (While Meta is not in the business of selling AI services, it still seeks to minimize AI costs for itself.)
“One of the biggest blockers to deploying AI has been cost,” Chintan Mota, director of enterprise technology at Wipro, told PYMNTS. “The infrastructure, the inference, the lock-in — it all adds up.”
However, open-source models like Llama 4, DeepSeek and others are enabling companies to build a model fine-tuned to their businesses, trained on their own data and running in their environment, Mota said. “You’re not stuck waiting for a Gemini or (OpenAI’s) GPT feature release. You have more control over your own data and its security.”
Meta’s open-source Llama family will “put pressure on closed models like Gemini. Not because Llama is better, but because it’s good enough,” Mota added. “For 80% of business use cases — automating reports, building internal copilots, summarizing knowledge bases — ‘good enough’ and affordable beats ‘perfect’ and pricey.”
Read more: Musk’s Grok 3 Takes Aim at Perplexity, OpenAI
Fewer Filters, Just Like GrokLlama 4 Scout and Maverick have a mixture-of-experts (MoE) architecture — meaning they don’t activate all the “expert” bots for all tasks. Instead, they pick and choose the right ones — for speed and to save money.
They were pre-trained on 200 languages, half with over 1 billion tokens each. Meta said this is 10 times more multilingual tokens than Llama 3.
Meta said Scout and Maverick were taught by Llama 4 Behemoth, a 2-trillion-parameter model that’s still in training. It is in preview.
“The three Llama 4 models are geared toward reasoning, coding, and step-by-step problem-solving. However, they do not appear to exhibit the deeper chain-of-thought behavior seen in specialized reasoning models like OpenAI’s ‘o’ series or DeepSeek R1,” Rogers Jeffrey Leo John, co-founder and CTO of DataChat, told PYMNTS.
“Still, despite not being the absolute best model available, LLama 4 outperforms several leading closed-source alternatives on various benchmarks,” John added.
Finally, Meta said it made Llama 4 less prone to punting questions it deems too sensitive — to be more “comparable to Grok,” the AI model from Elon Musk’s AI startup, xAI. The latest version, Grok 3, is designed to “relentlessly seek the truth,” according to xAI.
According to Meta, “our goal is to remove bias from our AI models and to make sure that Llama can understand and articulate both sides of a contentious issue.” It is less censorious than Llama 3.
For example, Llama 4 refuses to answer queries related to debated political and social topics less than 2% of the time, compared with 7% for Llama 3.3. Meta claims that Llama 4 is more “balanced” in choosing which prompts not to answer and is getting better at staying politically neutral.
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.
The post Meta’s Llama 4 Models Are Bad for Rivals but Good for Enterprises, Experts Say appeared first on PYMNTS.com.