Rothman | Transformers for Natural Language Processing and Computer Vision | E-Book | sack.de
E-Book

E-Book, Englisch, 730 Seiten

Rothman Transformers for Natural Language Processing and Computer Vision

Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3
3. Auflage 2024
ISBN: 978-1-80512-374-3
Verlag: De Gruyter
Format: EPUB
Kopierschutz: 0 - No protection

Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3

E-Book, Englisch, 730 Seiten

ISBN: 978-1-80512-374-3
Verlag: De Gruyter
Format: EPUB
Kopierschutz: 0 - No protection



Transformers for Natural Language Processing and Computer Vision, Third Edition, explores Large Language Model (LLM) architectures, practical applications, and popular platforms (Hugging Face, OpenAI, and Google Vertex AI) used for Natural Language Processing (NLP) and Computer Vision (CV).
The book guides you through a range of transformer architectures from foundation models and generative AI. You'll pretrain and fine-tune LLMs and work through different use cases, from summarization to question-answering systems leveraging embedding-based search. You'll also implement Retrieval Augmented Generation (RAG) to enhance accuracy and gain greater control over your LLM outputs. Additionally, you'll understand common LLM risks, such as hallucinations, memorization, and privacy issues, and implement mitigation strategies using moderation models alongside rule-based systems and knowledge integration.
Dive into generative vision transformers and multimodal architectures, and build practical applications, such as image and video classification. Go further and combine different models and platforms to build AI solutions and explore AI agent capabilities.
This book provides you with an understanding of transformer architectures, including strategies for pretraining, fine-tuning, and LLM best practices.

Rothman Transformers for Natural Language Processing and Computer Vision jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


Table of Contents - What are Transformers?
- Getting Started with the Architecture of the Transformer Model
- Emergent vs Downstream Tasks: The Unseen Depths of Transformers
- Advancements in Translations with Google Trax, Google Translate, and Gemini
- Diving into Fine-Tuning through BERT
- Pretraining a Transformer from Scratch through RoBERTa
- The Generative AI Revolution with ChatGPT
- Fine-Tuning OpenAI GPT Models
- Shattering the Black Box with Interpretable Tools
- Investigating the Role of Tokenizers in Shaping Transformer Models
- Leveraging LLM Embeddings as an Alternative to Fine-Tuning
- Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4
- Summarization with T5 and ChatGPT
- Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2
- Guarding the Giants: Mitigating Risks in Large Language Models
- Beyond Text: Vision Transformers in the Dawn of Revolutionary AI
- Transcending the Image-Text Boundary with Stable Diffusion
- Hugging Face AutoTrain: Training Vision Models without Coding
- On the Road to Functional AGI with HuggingGPT and its Peers
- Beyond Human-Designed Prompts with Generative Ideation


Preface


Transformer-driven Generative AI models are a game-changer for Natural Language Processing (NLP) and computer vision. Large Language Generative AI transformer models have achieved superhuman performance through services such as ChatGPT with GPT-4V for text, image, data science, and hundreds of domains. We have gone from primitive Generative AI to superhuman AI performance in just a few years!

Language understanding has become the pillar of language modeling, chatbots, personal assistants, question answering, text summarizing, speech-to-text, sentiment analysis, machine translation, and more. The expansion from the early Large Language Models (LLMs) to multimodal (text, image, sound) algorithms has taken AI into a new era.

For the past few years, we have been witnessing the expansion of social networks versus physical encounters, e-commerce versus physical shopping, digital newspapers, streaming versus physical theaters, remote doctor consultations versus physical visits, remote work instead of on-site tasks, and similar trends in hundreds more domains. This digital activity is now increasingly driven by transformer copilots in hundreds of applications.

The transformer architecture began just a few years ago as revolutionary and disruptive. It broke with the past, leaving the dominance of RNNs and CNNs behind. BERT and GPT models abandoned recurrent network layers and replaced them with self-attention. But in 2023, OpenAI GPT-4 propelled AI into new realms with GPT-4V (vision transformer), which is paving the path for functional (everyday tasks) AGI. Google Vertex AI offered similar technology. 2024 is not a new year in AI; it’s a new decade! Meta (formerly Facebook) has released Llama 2, which we can deploy seamlessly on Hugging Face.

Transformer encoders and decoders contain attention heads that train separately, parallelizing cutting-edge hardware. Attention heads can run on separate GPUs, opening the door to billion-parameter models and soon-to-come trillion-parameter models.

The increasing amount of data requires training AI models at scale. As such, transformers pave the way to a new era of parameter-driven AI. Learning to understand how hundreds of millions of words and images fit together requires a tremendous amount of parameters. Transformer models such as Google Vertex AI PaLM 2 and OpenAI GPT-4V have taken emergence to another level. Transformers can perform hundreds of NLP tasks they were not trained for.

Transformers can also learn image classification and reconstruction by embedding images as sequences of words. This book will introduce you to cutting-edge computer vision transformers such as Vision Transformers (ViTs), CLIP, GPT-4V, DALL-E 3, and Stable Diffusion.

Think of how many humans it would take to control the content of the billions of messages posted on social networks per day to decide if they are legal and ethical before extracting the information they contain.

Think of how many humans would be required to translate the millions of pages published each day on the web. Or imagine how many people it would take to manually control the millions of messages and images made per minute!

Imagine how many humans it would take to write the transcripts of all of the vast amount of hours of streaming published per day on the web. Finally, think about the human resources that would be required to replace AI image captioning for the billions of images that continuously appear online.

This book will take you from developing code to prompt engineering, a new “programming” skill that controls the behavior of a transformer model. Each chapter will take you through the key aspects of language understanding and computer vision from scratch in Python, PyTorch, and TensorFlow.

You will learn the architecture of the Original Transformer, Google BERT, GPT-4, PaLM 2, T5, ViT, Stable Diffusion, and several other models. You will fine-tune transformers, train models from scratch, and learn to use powerful APIs.

You will keep close to the market and its demand for language understanding in many fields, such as media, social media, and research papers, for example. You will learn how to improve Generative AI models with Retrieval Augmented Generation (RAG), embedding-based searches, prompt engineering, and automated ideation with AI-generated prompts.

Throughout the book, you will work hands-on with Python, PyTorch, and TensorFlow. You will be introduced to the key AI language understanding neural network models. You will then learn how to explore and implement transformers.

You will learn the skills required not only to adapt to the present market but also to acquire the vision to face innovative projects and AI evolutions. This book aims to give readers both the knowledge and the vision to select the right models and environment for any given project.

Who this book is for


This book is not an introduction to Python programming or machine learning concepts. Instead, it focuses on deep learning for machine translation, speech-to-text, text-to-speech, language modeling, question answering, and many more NLP domains, as well as computer vision multimodal tasks.

Readers who can benefit the most from this book are:

  • Deep learning, vision, and NLP practitioners familiar with Python programming.
  • Data analysts, data scientists, and machine learning/AI engineers who want to understand how to process and interrogate the increasing amounts of language-driven and image data.

What this book covers


Part I: The Foundations of Transformers


, , explains, at a high level, what transformers and Foundation Models are. We will first unveil the incredible power of the deceptively simple O(1) time complexity of transformer models that changed everything. We will continue to discover how a hardly known transformer algorithm in 2017 rose to dominate so many domains and brought us Foundation Models.

, , goes through the background of NLP to understand how RNN, LSTM, and CNN architectures were abandoned and how the transformer architecture opened a new era. We will go through the Original Transformer’s architecture through the unique approach invented by the Google Research and Google Brain authors. We will describe the theory of transformers. We will get our hands dirty in Python to see how multi-attention head sublayers work.

, , bridges the gap between the functional and mathematical architecture of transformers by introducing . We will then see how to measure the performance of transformers before exploring several downstream tasks, such as the Stanford Sentiment TreeBank (SST-2), linguistic acceptability, and Winograd schemas.

, , goes through machine translation in three steps. We will first define what machine translation is. We will then preprocess a Workshop on Machine Translation (WMT) dataset. Finally, we will see how to implement machine translations.

, , builds on the architecture of the Original Transformer. Bidirectional Encoder Representations from Transformers (BERT) takes transformers into a vast new way of perceiving the world of NLP. Instead of analyzing a past sequence to predict a future sequence, BERT attends to the whole sequence! We will first go through the key innovations of BERT’s architecture and then fine-tune a BERT model by going through each step in a Google Colaboratory notebook. Like humans, BERT can learn tasks and perform other new ones without having to learn the topic from scratch.

, , builds a RoBERTa transformer model from scratch using the Hugging Face PyTorch modules. The transformer will be both BERT-like and DistilBERT-like. First, we will train a tokenizer from scratch on a customized dataset. Finally, we will put the knowledge acquired in this chapter to work and pretrain a Generative AI customer support model on X (formerly Twitter) data.

Part II: The Rise of Suprahuman NLP


, , goes through the tremendous improvements and diffusion of ChatGPT models into the everyday lives of developers and end-users. We will first examine the architecture of OpenAI’s GPT models before working with the GPT-4 API and its hyperparameters to implement several NLP examples. Finally, we will learn how to obtain better results with Retrieval Augmented Generation (RAG). We will implement an example of automated RAG with GPT-4.

, , explores fine-tuning to make sense of the choices we can make for a project to go in this direction or not. We will introduce risk management perspectives. We will prepare a dataset and fine-tune a cost-effective babbage-02 model for a completion task.

, , lifts the...


Rothman Denis:
Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.