Blog Post

Apps on Azure Blog
4 MIN READ

Announcing GA for Azure Container Apps Serverless GPUs

Cary_Chai's avatar
Cary_Chai
Icon for Microsoft rankMicrosoft
Mar 18, 2025

Azure Container Apps Serverless GPUs are now GA. Serverless GPUs enable you to seamlessly run your AI workloads on-demand with automatic scaling, optimized cold start, per-second billing, and reduced operational overhead.

Azure Container Apps Serverless GPUs accelerated by NVIDIA are now generally available. Serverless GPUs enable you to seamlessly run AI workloads with per-second billing and scale down to zero when not in use. Thus, reducing operational overhead to support easy real-time custom model inferencing and other GPU-accelerated workloads.

Serverless GPUs accelerate the speed of AI development teams by allowing customers to focus on core AI code and less on managing infrastructure when using GPUs. This provides an excellent middle layer option between Azure AI Model Catalog's serverless APIs and hosting custom models on managed compute. Now customers can build their own serverless API endpoints for inferencing AI models including custom models. Customers can also provision on-demand GPU-powered Jupyter Notebooks or run other compute-intensive AI workloads that are ephemeral in nature. It provides full data governance as customer’s data never leaves the boundaries of the container while still providing a managed, serverless platform from which to build your applications.

This GA release of Serverless GPUs also adds support for NVIDIA NIM microservices, NVIDIA NIM™, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing at scale. Supporting a wide range of AI models, including open-source community and NVIDIA AI Foundation models, NVIDIA NIM ensures seamless, scalable AI inferencing leveraging industry-standard APIs.

Create an Azure Container App with Serverless GPUs and run stable diffusion.

Key benefits of serverless GPUs

  • Scale-to zero GPUs: Support for serverless scaling of NVIDIA A100 and T4 GPUs.
  • Per-second billing: Pay only for the GPU compute you use.
  • Built-in data governance: Your data never leaves the container boundary.
  • Flexible compute options: Choose between NVIDIA A100 and T4 GPUs.
  • Middle-layer for AI development: Bring your own model on a managed, serverless compute platform and easily run your AI applications alongside your existing apps.

Scenarios

Our customers have been running a wide range of workloads on serverless GPUs. Below are some common use cases.

NVIDIA T4

  • Real-time and batch inferencing: Using custom open-source models with fast startup times, automatic scaling, and a per-second billing model, serverless GPUs are ideal for dynamic applications that don't already have a serverless API in the model catalog.

NVIDIA A100

  • Compute intensive machine learning scenarios: Significantly speed up applications that implement fine-tuned custom generative AI models, deep learning, or neural networks.
  • High performance computing (HPC) and data analytics: Applications that require complex calculations or simulations, such as scientific computing and financial modeling as well as accelerated data processing and analysis among massive datasets.

Serverless GPUs with NVIDIA NIM    

Serverless GPUs now support NVIDIA NIM microservices, which simplify and accelerate the development of AI applications and agentic AI workflows with pre-packaged, scalable, and performance-tuned models that can be deployed as secure inference endpoints on Azure Container Apps.      

In order to leverage the power of NVIDIA’s NIM, go to NVIDIA’s API catalog: Try NVIDIA NIM APIs, and select the NIM you wish to run with the ‘Run Anywhere’ NIM type. You will need to set your NGC_API_KEY as an environment variable when deploying Azure Container Apps. For a full set of instructions on how to add a NIM to your container app, follow the instructions here.

(Note: Each NIM model has certain hardware requirements, Azure Container Apps serverless GPUs support A100 and T4 GPUs. Please ensure the NIM you are selecting is supported by the hardware.)

Quota changes for GA

With GA, we are introducing default GPU quotas for enterprise and pay-as-you-go customers. All enterprise agreement customers will have quota for A100 and T4 GPUs.

The feature is supported in West US 3, Australia East, and Sweden Central.

Get started with serverless GPUs

From the portal, you can select to enable GPUs for your Consumption app in the container tab when creating your Container App or your Container App Job

Note: In order to achieve the best performance with serverless GPUs, use an Azure Container Registry (ACR) with artifact streaming enabled for your image tag. Follow steps here to enable artifact streaming on your ACR.

To learn more about getting started with serverless GPUs, see our quickstart.

You can also add a new consumption GPU workload profile to your existing Container App environment through the workload profiles UX in portal or through the CLI commands for managing workload profiles.

Learn more about serverless GPUs and NIMs

With serverless GPUs, Azure Container Apps now simplifies the development of your AI applications by providing scale-to-zero compute, pay-as you go pricing, reduced infrastructure management, and more. To learn more, visit:

Updated Mar 27, 2025
Version 3.0

7 Comments

  • Parik10's avatar
    Parik10
    Copper Contributor

    Hi, Does it supports East US2 region along with its paired one as well,?? Seems it's not GA yet to that.

    • BC2112's avatar
      BC2112
      Copper Contributor

      Totally. Kinda ridiculous. GA in what world?

      • Cary_Chai's avatar
        Cary_Chai
        Icon for Microsoft rankMicrosoft

        Hi Parik10 and BC2112, thank you for your feedback. It's invaluable in helping us determine which regions to prioritize, and I fully understand the sentiment if paired regions are a requirement for your apps. We will be working on adding additional regions.

  • BC2112's avatar
    BC2112
    Copper Contributor

    How about East US so we can get some region pairing going? East US/West US 3. Thanks!

  • BC2112's avatar
    BC2112
    Copper Contributor

    Honestly, when is this going to be supported in more regions? Frustrating.

    • Cary_Chai's avatar
      Cary_Chai
      Icon for Microsoft rankMicrosoft

      Hi BC2112, which regions in particular are you looking for? I'm also happy to chat if you prefer. Feel free to reach out on LinkedIn

OSZAR »