Meta’s Llama 4 models now available on Amazon Web Services

April 5, 2025

Meta’s Llama 4 models now available on AWS

Access Meta’s most powerful AI models to date in Amazon SageMaker JumpStart. Availability in Amazon Bedrock coming soon.

Key Takeaways

  • Meta’s newest AI models, Llama 4 Scout and Llama 4 Maverick, are now available on SageMaker Jumpstart. Availability as a fully managed, serverless option in Amazon Bedrock is coming soon.
  • Llama 4 models can process both images and text together for more powerful applications.
  • Instead of using all of its computing power for every question, a Llama 4 model can intelligently choose which ‘expert’ parts of its brain to activate based on the specific task.
  • Llama 4 can deliver more powerful results while using fewer computing resources—making advanced AI more accessible and cost-effective.

Amazon Web Services (AWS) has announced the availability of Meta’s new Llama 4 models via Amazon SageMaker JumpStart, with availability as fully managed, serverless models in Amazon Bedrock coming soon. The first two models in the Llama 4 herd—Llama 4 Scout 17B and Llama 4 Maverick 17B—both feature advanced multimodal capabilities (the ability to understand both image and text prompts) and industry-leading context windows (how much information they can process at once) for improved performance and efficiency over previous model versions.

The availability of Llama 4 Scout and Llama 4 Maverick on AWS expand the already broad selection of models offered to customers to build, deploy, and scale their applications. AWS consistently offers new models from leading AI companies such as Meta as soon as the models are released, with enterprise-grade tools and security that make it easy to build, customize, and scale generative AI applications.

Why you should care

Today’s news further reinforces AWS’s commitment to model choice with two new advanced multimodal models from Meta. Llama 4 Scout 17B significantly expands what AI can process at once—from 128,000 tokens in previous Llama models to now up to 10 million tokens (nearly 80x the previous context length)—underpinning applications that can summarize multiple documents together, analyze comprehensive user activity patterns, or reason through entire code bases at once. Llama 4 Maverick 17B is a general-purpose model that excels in image and text understanding tasks across 12 languages, making it well suited for sophisticated assistants and chat applications.

Both Llama 4 models are built with native multimodality, meaning they’re designed from the ground up to seamlessly understand text and images together, rather than handling them as separate inputs. And thanks to their more efficient mixture of experts (MoE) architecture—a first for Meta—that activates only the most relevant parts of the model for each task, customers can benefit from these powerful capabilities that are more compute efficient for model training and inference, translating into lower costs at greater performance.

Meet the AI: What is Llama 4 Scout 17B and Llama 4 Maverick 17B?

If Llama 4 models were people, Scout would be that detail-oriented research assistant with a photographic memory who can instantly recall information from thousands of documents while working from a tiny desk in a vast library. Scout anticipates informational needs before they’re even articulated, providing not just answers, but the context that makes those answers meaningful. Maverick would be the multilingual creative director with an eye for visual storytelling—equally comfortable drafting compelling narratives, analyzing complex images with precision, or maintaining a consistent brand voice across a wide array of languages in client meetings.

Crunching the numbers

  • Llama 4 Scout 17B packs 17 billion active parameters and 109 billion total parameters into a model that delivers state-of-the-art performance for its class, according to Meta.
  • Llama 4 Scout 17B also features an industry-leading context window of up to 10 million tokens—nearly 80 times larger than Llama 3’s 128K tokens. Consider this equivalent to the difference between a person being able to absorb information from several pages of a book at once, to an entire encyclopedia.
  • Llama 4 Maverick 17B contains 17 billion active parameters and 400 billion total parameters across 128 experts. Think of this as having 128 specialized machines that work together, but only activating the most relevant ones for each task—making it both powerful and efficient.

The bigger story

The models’ MoE architecture is almost like having a team of specialists rather than a single generalist. Instead of using all of its computing power for every question, the model intelligently chooses which ‘expert’ parts of its brain to activate based on the specific task. It’s similar to how a hospital routes patients to different specialists rather than having every doctor attempt to treat every condition. This more intentional approach means Llama 4 can deliver more powerful results while using fewer computing resources—making advanced AI more accessible and cost-effective for businesses of all sizes. For developers, this translates to being able to build sophisticated applications that can process massive amounts of information while supporting multiple languages, and handling both text and images seamlessly.

What’s around the corner?

With AWS’s commitment to bring the latest models from leading AI companies to customers as they’re available, customers can expect continued expansion of models across sizes and modalities, empowering them to realize the full potential of generative AI.

More immediately, AWS customers can look forward to accessing fully managed, serverless Llama 4 models in Amazon Bedrock, which are coming soon.

How to use the Llama 4 models

To get started, visit the Amazon SageMaker AI console.

More Amazon News

 

Search

RECENT PRESS RELEASES

Go to Top