Want to create an interactive transcript for this episode?
Podcast: AWS Podcast
Episode: #600: Amazon SageMaker Multi Model Endpoints
Description: Amazon SageMaker Multi-Model Endpoint (MME) is fully managed capability of SageMaker Inference that allows customers to deploy thousands of models on a single endpoint and save costs by sharing instances on which the endpoints run across all the models. Until recently, MME was only supported for machine learning (ML) models which run on CPU instances. Now, customers can use MME to deploy thousands of ML models on GPU based instances as well, and potentially save costs by 90%. MME dynamically loads and unloads models from GPU memory based on incoming traffic to the endpoint. Customers save cost with MME as the...