Deploying machine learning models requires an efficient serving infrastructure. In this session, we’ll explore KServe, a powerful Kubernetes-native solution for model inferencing. You'll learn how KServe simplifies model deployment, lightning-fast inference, monitoring, cost optimization while supporting multiple frameworks like Scikit-Learn, TensorFlow, PyTorch, LightGBM, Paddle, PMML, Spark MLib, XBoost and ONNX. It also enables their users to deploy large language models (LLMs) from Huggingface. This open source project provides a simple, pluggable solution for common infrastructure issues with inference models, like GPU scaling and ModelMesh serving for high volume/density use cases.
Whether you’re an ML engineer, DevOps expert, or AI leader, this session will equip you with the best practices and hands-on insights to take your model serving to the next level
---
This presentation was delivered at KCD Budapest 2025: https://kcdbudapest.hu/2025
You can find the slide decks here: https://kcdbudapest.hu/2025/slides
And the pictures taken during the event here: https://kcdbudapest.hu/2025/pictures