E-Book

E-Book, Englisch, 454 Seiten

Babcock / Bali Generative AI with Python and PyTorch

Navigating the AI frontier with LLMs, Stable Diffusion, and next-gen AI applications
2. Auflage 2025
ISBN: 978-1-83588-445-4
Verlag: De Gruyter
Format: EPUB
Kopierschutz: 0 - No protection

Häufig gestellte Fragen zu E-Books

E-Book, Englisch, 454 Seiten

Navigating the AI frontier with LLMs, Stable Diffusion, and next-gen AI applications

E-Book, Englisch, 454 Seiten

ISBN: 978-1-83588-445-4
Verlag: De Gruyter
Format: EPUB
Kopierschutz: 0 - No protection

Häufig gestellte Fragen zu E-Books

39,59 €

(inkl. MwSt.)

versandkostenfreie Lieferung
sofort verfügbar

Become an expert in Generative AI through immersive, hands-on projects that leverage today's most powerful models for Natural Language Processing (NLP) and computer vision. Generative AI with Python and PyTorch is your end-to-end guide to creating advanced AI applications, made easy by Raghav Bali, a seasoned data scientist with multiple patents in AI, and Joseph Babcock, a PhD and machine learning expert. Through business-tested approaches, this book simplifies complex GenAI concepts, making learning both accessible and immediately applicable.
From NLP to image generation, this second edition explores practical applications and the underlying theories that power these technologies. By integrating the latest advancements in LLMs, it prepares you to design and implement powerful AI systems that transform data into actionable intelligence.
You'll build your versatile LLM toolkit by gaining expertise in GPT-4, LangChain, RLHF, LoRA, RAG, and more. You'll also explore deep learning techniques for image generation and apply styler transfer using GANs, before advancing to implement CLIP and diffusion models.
Whether you're generating dynamic content or developing complex AI-driven solutions, this book equips you with everything you need to harness the full transformative power of Python and AI.

Babcock / Bali Generative AI with Python and PyTorch jetzt bestellen!

Autoren/Hrsg.

Babcock, Joseph

Bali, Raghav

Fachgebiete

Mathematik | Informatik EDV | Informatik EDV & Informatik Allgemein

Weitere Infos & Material

Leseproben

1 Introduction to Generative AI: Drawing Data from Models

At the Colorado State Fair in 2022, the winning entry was a fantastical sci-fi landscape created by video game designer Jason Allen titled (). The first-prize art was remarkable both from the dramatic subject matter and due to the unusual origin of this image. Unlike the majority of other artworks entered into the competition, was not painted using oil or watercolors, nor was its “creator” even human; rather, it is an entirely digital image produced by a sophisticated machine learning algorithm called Midjourney. Jason used Midjourney, which has been trained on diverse images, along with natural language instructions to create the image, rather than a brush and canvas.

Figure 1.1: Théâtre D’opéra Spatial1

Visual art is far from the only area in which machine learning has demonstrated astonishing results. Indeed, if you have paid attention to the news in the last few years, you have likely seen many stories about the groundbreaking results of modern AI systems applied to diverse problems, from the hard sciences to online avatars and interactive chat. Deep neural network models, such as the one powering Midjourney, have shown amazing abilities to generate realistic human language2, author computer code3, and solve school exams with human-level ability2. Such models can also classify X-ray images of human anatomy on the level of trained physicians4, beat human masters at both classic board games such as Go (an Asian form of chess) as well as multiplayer computer games5, 6, and translate French into English with amazing sensitivity to grammatical nuances7.

Free Benefits with Your Book

Your purchase includes a free PDF copy of this book along with other exclusive benefits. Check the section in the Preface to unlock them instantly and maximize your learning experience.

Discriminative versus generative models

However, these latter examples of AI differ in an important way from the model that generated . In all of these other applications, the model is presented with a set of inputs—data such as English text, or X-ray images—that is paired with a target output, such as the next word in a translated sentence or the diagnostic classification of an X-ray. Indeed, this is probably the kind of AI model you are most familiar with from prior experiences in predictive modeling; they are broadly known as models, whose purpose is to create a mapping between a set of input variables and a target output. The target output could be a set of discrete classes (such as which word in the English language appears next in a translation), or a continuous outcome (such as the expected amount of money a customer will spend in an online store over the next 12 months).

However, this kind of model, in which data is “labeled” or “scored,” represents only half of the capabilities of modern machine learning. Another class of algorithms, such as the one that generated the winning entry in the Colorado State Art Fair, doesn’t compute a score or label from input variables but rather . Unlike discriminative models, the input variables are often vectors of numbers that aren’t related to real-world values at all and are often even randomly generated. This kind of model, known as a , which can produce complex outputs such as text, music, or images from random noise, is the topic of this book.

Even if you did not know it at the time, you have probably seen other instances of generative models mentioned in the news alongside the discriminative examples given previously. A prominent example is deepfakes—videos in which one person’s face has been systematically replaced with another’s by using a neural network to remap the pixels8 ().

Figure 1.2: A deepfake image9

Maybe you have also seen stories about AI models that generate “fake news,” which scientists at the firm OpenAI were initially terrified to release to the public due to concerns it could be used to create propaganda and misinformation online ()11.

Figure 1.3: A chatbot dialogue created using GPT-210

In these and other applications—such as Google’s voice assistant which can make a restaurant reservation by dynamically creating conservation with a human in real time12, or even software that can generate original musical compositions13—we are surrounded by the outputs of generative AI algorithms. These models are able to handle complex information in a variety of domains: creating photorealistic images or stylistic “filters” on pictures, synthetic sound, conversational text, and even rules for optimally playing video games. You might ask: Where did these models come from? How can I implement them myself?

Implementing generative models

While generative models could theoretically be implemented using a wide variety of machine learning algorithms, in practice, they are usually built with deep neural networks, which are well suited to capture the complex variation in data such as images or language. In this book, we will focus on implementing these deep-learning-based generative models for many different applications using . PyTorch is a Python programming library used to develop and produce deep learning models. It was open-sourced by Meta (formerly Facebook) in 2016 and has become one of the most popular libraries for the research and deployment of neural network models. We’ll execute PyTorch code on the cloud using Google’s Colab notebook environment, which allows you to access world-class computing infrastructure including graphic processing units (GPUs) and tensor processing units (TPUs) on demand and without the need for onerous environment setups. We’ll also leverage the Pipelines library from Hugging Face, which provides an easy interface to run experiments using a catalog of some of the most sophisticated models available.

In the following chapters, you will learn not only the underlying theory behind these models but also the practical skills to implement them in popular programming frameworks. In , we’ll review how, since 2006, an explosion of research in “deep learning” using large neural network models has produced a wide variety of generative modeling applications. Innovations arising from this research included variational autoencoders (VAEs), which can efficiently generate complex data samples from random numbers that are “decoded” into realistic images, using techniques we will describe in . We will also describe a related image generation algorithm, the generative adversarial network (GAN), in more detail in of this book through applications for image generation, style transfer, and deepfakes. Conceptually, the model creates a competition between two neural networks.

One (termed the ) produces realistic (or, in the case of the experiments by Obvious, artistic) images starting from a set of random numbers that are “decoded” into realistic images by applying a mathematical transformation. In a sense, the generator is like an art student, producing new paintings from brushstrokes and creative inspiration. The second network, known as the , attempts to classify whether a picture comes from a set of real-world images, or whether it was created by the generator. Thus, the discriminator acts like a teacher, grading whether the student has produced work comparable to the paintings they are attempting to mimic. As the generator becomes better at fooling the discriminator, its output becomes closer and closer to the historical examples it is designed to copy. In we’ll also describe the algorithm used in , the latent diffusion model, which builds on VAEs to provide scalable image synthesis based on natural language prompts from a human user.

Another key innovation in generative models is in the domain of natural language data—by representing the complex interrelationship between words in a sentence in a computationally scalable way, the Transformer network and the Bidirectional Encoder from Transformers (BERT) model built on top of it present powerful building blocks to generate textual data in applications such as chatbots and large language models (LLMs), which we’ll cover in and . In , we will dive deeper into the most famous open-source models in the current LLM landscape, including Llama. In and .

Before diving into further details on the various applications of generative models and how to implement them in PyTorch, we will take a step back and examine how exactly generative models are different from...

Über Autor(innen)

Babcock Joseph :

Joseph Babcock has spent over a decade working with big data and AI in the e-commerce, digital streaming, and quantitative finance domains. Throughout his career, he has worked on recommender systems, petabyte-scale cloud data pipelines, A/B testing, causal inference, and time series analysis. He completed his PhD studies at Johns Hopkins University, applying machine learning to drug discovery and genomics.Bali Raghav :

Raghav Bali is a Staff Data Scientist at Delivery Hero, a leading food delivery service headquartered in Berlin, Germany. With 12+ years of expertise, he specializes in research and development of enterprise-level solutions leveraging Machine Learning, Deep Learning, Natural Language Processing, and Recommendation Engines for practical business applications. Besides his professional endeavors, Raghav is an esteemed mentor and an accomplished public speaker. He has contributed to multiple peer-reviewed papers and authored multiple well received books. Additionally, he holds co-inventor credits on multiple patents in healthcare, machine learning, deep learning, and natural language processing.

Produktsicherheit

Fragen zum Artikel?

Ihre Fragen, Wünsche oder Anmerkungen

Vorname*

Nachname*

Ihre E-Mail-Adresse*

Kundennr.

Ihre Nachricht*

Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.

Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.

39,59 € (inkl. MwSt.)

sofort verfügbar

Webcode: sack.de/lcsop