Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

WG Serving: Accelerating AI/ML Inference Workloads on Kubernetes

Abstract:

The emergence of Generative AI (GenAI) has introduced new challenges and demands in AI/ML inference, necessitating advanced solutions for efficient serving infrastructures. The recently created Kubernetes Working Group Serving (WG Serving) is dedicated to enhancing serving workload on K8s, especially for hardware-accelerated AI/ML inference. This group prioritizes compute-intensive inference scenarios using specialized accelerators, benefiting various serving workloads such as web services and stateful databases. This session will dive into WG Serving's initiatives and workstreams. We will spotlight discussions and advancements in each workstream. We are also actively looking for feedback and partnership with model server authors and other practitioners who want to utilize powers of K8s for their serving workloads. Join us to gain insight into our work and learn how to contribute to advancing AI/ML inference on K8s.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubecon-na-2024-wg-serving

kubecon-na-2024-wg-serving

README.md

WG Serving: Accelerating AI/ML Inference Workloads on Kubernetes

Files

kubecon-na-2024-wg-serving

Directory actions

More options

Directory actions

More options

Latest commit

History

kubecon-na-2024-wg-serving

Folders and files

parent directory

README.md

WG Serving: Accelerating AI/ML Inference Workloads on Kubernetes