Panda / Bates / Pittampally | Serverless Machine Learning with Amazon Redshift ML | E-Book | sack.de
E-Book

E-Book, Englisch, 290 Seiten

Panda / Bates / Pittampally Serverless Machine Learning with Amazon Redshift ML

Create, train, and deploy machine learning models using familiar SQL commands

E-Book, Englisch, 290 Seiten

ISBN: 978-1-80461-969-8
Verlag: Packt Publishing
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)



Amazon Redshift Serverless enables organizations to run petabyte-scale cloud data warehouses quickly and in a cost-effective way, enabling data science professionals to efficiently deploy cloud data warehouses and leverage easy-to-use tools to train models and run predictions. This practical guide will help developers and data professionals working with Amazon Redshift data warehouses to put their SQL knowledge to work for training and deploying machine learning models.
The book begins by helping you to explore the inner workings of Redshift Serverless as well as the foundations of data analytics and types of data machine learning. With the help of step-by-step explanations of essential concepts and practical examples, you’ll then learn to build your own classification and regression models. As you advance, you’ll find out how to deploy various types of machine learning projects using familiar SQL code, before delving into Redshift ML. In the concluding chapters, you’ll discover best practices for implementing serverless architecture with Redshift.
By the end of this book, you’ll be able to configure and deploy Amazon Redshift Serverless, train and deploy machine learning models using Amazon Redshift ML, and run inference queries at scale.
Panda / Bates / Pittampally Serverless Machine Learning with Amazon Redshift ML jetzt bestellen!

Weitere Infos & Material


1
Introduction to Amazon Redshift Serverless
“Hey, what’s a data warehouse?” John Doe, CEO and co-founder of Red.wines, a fictional specialty wine e-commerce company, asked Tathya Vishleshak*, the company’s CTO. John, who owned a boutique winery, had teamed up with Tathya for the project. The company’s success surged during the pandemic, driven by social media and the stay-at-home trend. John wanted detailed data analysis to align inventory and customer outreach. However, there was a problem – producing this analysis was slowing down their online transaction processing (OLTP) database. “A data warehouse is like a big database where we store different data for a long time to find insights and make decisions,” Tathya explained. John had a concern, “Sounds expensive; we’re already paying for unused warehouse space. Can we afford it?” Tathya reassured him, “You’re right, but there are cloud data warehouses such as Amazon Redshift Serverless that let you pay as you use.” Expanding on this, this chapter introduces data warehousing and Amazon Redshift. We’ll cover Amazon Redshift Serverless basics, such as namespaces and workgroups, and guide you in creating a data warehouse. Amazon Redshift can gather data from various sources, mainly Amazon Simple Storage Service (S3). As we go through this chapter, you’ll learn about a crucial aspect of this, the AWS Identity and Access Management (IAM) role, needed for loading data from S3. This role connects to your Serverless namespace for smooth data transfer. You’ll also learn how to load sample data and run queries using Amazon Redshift query editor. Our goal is to make it simple and actionable, so you’re confident in navigating this journey. Tathya Vishleshak The phrase 'Tathya Vishleshak' can be loosely interpreted to reflect the concept of a data analyst in Sanskrit/Hindi. However, it's important to note that this is not a precise or established translation, but rather an attempt to convey a similar meaning based on the individual meanings of the words 'Tathya' and 'Vishleshak' in Sanskrit. Additionally, Amazon Redshift is used to analyze structured and unstructured data in data warehouses, operational databases, and data lakes. It’s employed for traditional data warehousing, business intelligence, real-time analytics, and machine learning/predictive analytics. Data analysts and developers use Redshift data with machine learning (ML) models for tasks such as predicting customer behavior. Amazon Redshift ML streamlines this process using familiar SQL commands. The book delves into ML, explaining supervised and unsupervised training. You’ll learn about problem-solving with binary classification, multi-class classification, and regression using real-world examples. You’ll also discover how to create deep learning models and custom models with XGBoost, as well as use time series forecasting. The book also covers in-database and remote inferences using existing models, applying ML for predictive analytics, and operationalizing machine learning models. The following topics will be covered in this chapter: What is Amazon Redshift? Getting started with Amazon Redshift Serverless Connecting to your data warehouse This chapter requires a web browser and access to an AWS account. What is Amazon Redshift?
Organizations churn out vast troves of customer data along with insights into these customers’ interactions with the business. This data gets funneled into various applications and stashed away in disconnected systems. A conundrum arises when attempting to decipher these data silos – a formidable challenge that hampers the derivation of meaningful insights essential for organizational clarity. Adding to this complexity, security and performance considerations typically muzzle business analysts from accessing data within OLTP systems. The hiccup is that intricate analytical queries weigh down OLTP databases, casting a shadow over their core operations. Here, the solution is the data warehouse, which is a central hub of curated data, used by business analysts and data scientists to make informed decisions by employing the business intelligence and machine learning tools at their disposal. These users make use of Structured Query Language (SQL) to derive insights from this data trove. From operational systems, application logs, and social media streams to the influx of IoT device-generated data, customers channel structured and semi-structured data into organizations’ data warehouses, as depicted in Figure 1.1, showcasing the classic architecture of a conventional data warehouse. Figure 1.1 – Data warehouse Here’s where Amazon Redshift Serverless comes in. It’s a key option within Amazon Redshift, a well-managed cloud data warehouse offered by Amazon Web Services (AWS). With cloud-based ease, Amazon Redshift Serverless lets you set up your data storage without infrastructure hassles or cost worries. You pay based on what you use for compute and storage. Amazon Redshift Serverless goes beyond convenience, propelling modern data applications that seamlessly connect to the data lake. Enter the data lake – a structure that gathers all data strands under one roof, providing limitless space to store data at any scale, cost-effectively. Alongside other data repositories such as data warehouses, data lakes redefine how organizations handle data. And this is where it all comes together – the following diagram shows how Amazon Redshift Serverless injects SQL-powered queries into the data lake, driving a dynamic data flow: Figure 1.2 – Data lake and data warehouse So, let’s get started on creating our first data warehouse in the cloud! Getting started with Amazon Redshift Serverless
You can create your data warehouse with Amazon Redshift Serverless using the AWS Command-Line Interface (CLI), the API, AWS CloudFormation templates, or the AWS console. We are going to use the AWS console to create a Redshift Serverless data warehouse. Log in to your AWS console and search for Redshift in the top bar, as shown in Figure 1.3: Figure 1.3 – AWS console page showing services filtered by our search for Redshift Click on Amazon Redshift, which will take you to the home page for the Amazon Redshift console, as shown in Figure 1.4. To help get you started, Amazon provides free credit for first-time Redshift Serverless customers. So, let’s start creating your trial data warehouse by clicking on Try Amazon Redshift Serverless. If you or your organization has tried Amazon Redshift Serverless before, you will have to pay for the service based on your usage: Figure 1.4 – Amazon Redshift service page in the AWS console If you have free credit available, it will be indicated at the top of your screen, as in Figure 1.5: Figure 1.5 – AWS console showing the Redshift Serverless Get started page You can either choose the defaults or use the customized settings to create your data warehouse. The customized settings give you more control, allowing you to specify many additional parameters for your compute configuration including the workgroup, data-related settings such as the namespace, and advanced security settings. We will use the customized settings, which will help us customize the namespace settings for our Serverless data warehouse. A namespace combined with a workgroup is what makes a data warehouse with Redshift Serverless, as we will now see in more detail. What is a namespace?
Amazon Redshift Serverless provides a separation of storage and compute for a data warehouse. A namespace is a collection of all your data stored in the database such as your tables, views, database users, and their privileges. You are separately charged for storage based on the size of the data stored in your data warehouse. For compute, you are charged for the capacity used over a given duration in Redshift processing hours (RPU) on a per second-basis. The storage capacity is billed as Redshift managed storage (RMS) and is billed by GB/month. You can view...


Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.