Srirama / Gupta / Saha | Kubernetes for Generative AI Solutions | E-Book | www.sack.de
E-Book

E-Book, Englisch, 338 Seiten

Srirama / Gupta / Saha Kubernetes for Generative AI Solutions

A complete guide to designing, optimizing, and deploying Generative AI workloads on Kubernetes
1. Auflage 2025
ISBN: 978-1-83620-992-8
Verlag: De Gruyter
Format: EPUB
Kopierschutz: 0 - No protection

A complete guide to designing, optimizing, and deploying Generative AI workloads on Kubernetes

E-Book, Englisch, 338 Seiten

ISBN: 978-1-83620-992-8
Verlag: De Gruyter
Format: EPUB
Kopierschutz: 0 - No protection



Generative AI (GenAI) is revolutionizing industries, from chatbots to recommendation engines to content creation, but deploying these systems at scale poses significant challenges in infrastructure, scalability, security, and cost management.
This book is your practical guide to designing, optimizing, and deploying GenAI workloads with Kubernetes (K8s) the leading container orchestration platform trusted by AI pioneers. Whether you're working with large language models, transformer systems, or other GenAI applications, this book helps you confidently take projects from concept to production. You'll get to grips with foundational concepts in machine learning and GenAI, understanding how to align projects with business goals and KPIs. From there, you'll set up Kubernetes clusters in the cloud, deploy your first workload, and build a solid infrastructure. But your learning doesn't stop at deployment. The chapters highlight essential strategies for scaling GenAI workloads in production, covering model optimization, workflow automation, scaling, GPU efficiency, observability, security, and resilience.
By the end of this book, you'll be fully equipped to confidently design and deploy scalable, secure, resilient, and cost-effective GenAI solutions on Kubernetes.

Srirama / Gupta / Saha Kubernetes for Generative AI Solutions jetzt bestellen!

Weitere Infos & Material


1


Generative AI Fundamentals


Generative AI (GenAI) has revolutionized our world and has grabbed everyone’s attention since the introduction of ChatGPT in November of 2022 by OpenAI (https://openai.com/index/chatgpt/). However, the foundational concepts of this technology have been around for quite some time. In this chapter, we will introduce the key concepts of GenAI and how it has evolved over time. We will then discuss how to think about a GenAI project and align it with the business objectives, covering the entire process for developing and deploying GenAI workloads, along with potential use cases across different industries.

In this chapter, we’re going to cover the following main topics:

  • Artificial intelligence versus GenAI
  • The evolution of machine learning
  • Transformer architecture
  • The GenAI project life cycle
  • The GenAI deployment stack
  • GenAI project use cases

Artificial Intelligence versus GenAI


Before we dive deeper into GenAI concepts, let’s discuss the differences between Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and GenAI, as these terms are often used interchangeably.

shows the relationships between these concepts.

Figure 1.1 – Relationships between AI, ML, DL, and GenAI

Let’s learn more about these relationships:

  • AI: AI refers to a system or algorithm that is capable of performing tasks that would otherwise typically require human intelligence. These tasks include reasoning, learning, problem-solving, perception, and language understanding. AI is a broad category and can include rule-based systems, expert systems, neural networks, and GenAI algorithms. The evolution of AI algorithms has provided machines with human-like senses and capabilities, such as vision to analyze the world around them, listening and speaking to understand natural language and respond verbally, and using sensor data to understand the external environment and respond accordingly.
  • ML: ML is a subset of AI that involves algorithms and models that enable machines to learn from data and make predictions without requiring explicit coding. In traditional programming, developers write explicit instructions for a computer to execute, whereas in ML, algorithms learn from the patterns and relationships in data and make predictions. ML can further be divided into the following sub-categories:
    • Supervised learning: This uses labeled datasets to train the models. It can further be subdivided into classification and regression problems:
      • Classification problems use labeled data, such as labeled pictures of dogs and cats, to train the model. Once the model is trained, it can classify a user-provided picture using the classes it has been trained on.
      • Regression problems, on the other hand, use numerical data to understand the relationship between dependent and independent variables, such as house pricing based on different attributes. Once a model establishes a relationship, it can then forecast the pricing for different sets of attributes, even if the model has not been trained on these specific attributes. Some popular regression algorithms are linear regression, logistic regression, and polynomial regression.
    • Unsupervised learning: This uses ML algorithms to analyze and cluster unlabeled datasets to discover hidden patterns in data. Unsupervised learning can further be divided into the following two sub-categories:
      • Clustering algorithms group data based on similarities or differences. A popular clustering algorithm is the k-means clustering algorithm, which uses Euclidian distances between data points to measure the similarity between data points and assign them in distinct, non-overlapping clusters. It iterates to refine the clusters to minimize the variance within each cluster. A typical use case is segmenting customers based on purchasing behavior, demographics, or preferences to target marketing strategies effectively.
      • Dimensionality reduction is another form of unsupervised learning, which is used to reduce the number of features/dimensions in a given dataset. It aims to simplify models, reduce computational costs, and improve overall model performance. Principal Component Analysis (PCA) (https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c) is a popular algorithm used for dimensionality reduction. It achieves this by finding a new set of features called components, which are composites of the original features that are uncorrelated with one another.
    • Semi-supervised learning: This is a type of ML that combines supervised and unsupervised learning by leveraging both labeled and unlabeled data for training. This is particularly useful when obtaining labeled data is time-consuming and expensive because you can use small amounts of labeled data for training and then iteratively apply it to the large amounts of unlabeled data. This can be applied in both classification and regression use cases, such as spam/image/object detection, speech recognition, and forecasting.
    • Reinforcement learning: In reinforcement learning, there is an agent and reward system, and algorithms learn by trial and error to maximize the reward for the agent. An agent is an autonomous system, like a computer program or robot, that can make decisions and act in response to its environment without direct human instructions. Rewards are given from the environment when agent actions lead to a positive outcome. For example, if we want to train a robot to walk without falling over, positive rewards are given for actions that help the robot to remain upright, and negative rewards are given for actions that cause it to fall over. The robot begins by trying different actions randomly, such as leaning forward, moving its legs, or shifting its weight. As it performs these actions, it observes the resulting changes in its state. The robot uses feedback (rewards) to update its understanding of which actions are beneficial and thus learns to walk over time.

    We have summarized the different categories of ML in :

Figure 1.2 – Different categories of ML

  • DL: DL is a subset of ML that involves deep neural networks with many layers. Conceptually, it is inspired by the human brain, which has billions of deeply connected neurons and provides humans with very advanced cognition. Some popular examples of deep neural nets are Convolutional Neural Networks (CNNs), used for image processing, and Recurrent Neural Networks (RNNs), which are used for analyzing time series data or natural language processing.
  • GenAI: GenAI is a further subset of DL and focuses on creating new data, such as text, images, music, and other forms of content. Lots of generative applications are based on Foundational Models (FMs), which are large-scale AI models trained on vast amounts of diverse data, serving as a base for a wide range of downstream tasks. They are pre-trained on broad datasets and can be fine-tuned for specific applications. Large Language Models (LLMs) are a subset of FMs specifically designed for understanding and generating human language. GenAI is the primary focus of this book; we will be diving into its details later in the book.

Now that we understand the key differences between AI, ML, DL, and GenAI, let’s explore the evolution of ML and how transformer architecture has revolutionized the ML landscape, particularly in the field of Natural Language Processing (NLP).

Evolution of machine learning


Since this book is about GenAI, what could be a better way to start it than asking ChatGPT to summarize the evolution of AI and ML over the last decade?

Prompt: "Why did the chicken cross the road?" Describe how that question's answer evolved using AI/ML over the last decade.

ChatGPT Response (ChatGPT-4o, June 16th, 2024):

The evolution of AI/ML responses to the question "Why did the chicken cross the road?" over the past decade reflects significant advancements in language processing...



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.