Kandarpa / Saxena | Data Wrangling with SQL | E-Book | sack.de
E-Book

E-Book, Englisch, 350 Seiten

Kandarpa / Saxena Data Wrangling with SQL

A hands-on guide to manipulating, wrangling, and engineering data using SQL
1. Auflage 2023
ISBN: 978-1-83763-430-9
Verlag: De Gruyter
Format: EPUB
Kopierschutz: 0 - No protection

A hands-on guide to manipulating, wrangling, and engineering data using SQL

E-Book, Englisch, 350 Seiten

ISBN: 978-1-83763-430-9
Verlag: De Gruyter
Format: EPUB
Kopierschutz: 0 - No protection



No detailed description available for "Data Wrangling with SQL".

Kandarpa / Saxena Data Wrangling with SQL jetzt bestellen!

Weitere Infos & Material


Table of Contents - Database Introduction
- Data Profiling and Preparation before Data Wrangling
- Data Wrangling on String Data Types
- Data Wrangling on the DATE Data Type
- Handling NULL values
- Pivoting Data Using SQL
- Subqueries and CTEs
- Aggregate Functions
- SQL Window Functions
- Optimizing Query Performance
- Descriptive Statistics with SQL
- Time Series with SQL
- Outlier Detection


Table of Contents
Preface
Part 1: Data Wrangling Introduction
1
Database Introduction
Getting started
Establishing the foundation
Efficient data organization
Data integrity and consistency
Technical requirements
Decoding database structures – relational and non-relational
What is a database?
Types of databases
Tables and relationships
The SQL CREATE DATABASE statement
The SQL CREATE TABLE statement
SQL DROP TABLE versus TRUNCATE TABLE
SQL ALTER TABLE
SQL constraints
SQL keys
Database relationships
Comparing database normalization and denormalization
Normalization
Types of normalization
Denormalization
When to apply denormalization
Disadvantages of denormalization
Summary
Practical exercises
Practical exercise 1
Practical exercise 2
Practical exercise 3
Practical exercise 4
2
Data Profiling and Preparation before Data Wrangling
What is data wrangling?
Data wrangling steps
The importance of data wrangling
Benefits of data wrangling
Data wrangling use cases
Business use cases
Data capture
How does data get captured?
Data-capturing techniques
Web scraping
Structured versus unstructured data
Paid-for versus free data-wrangling tools
Data profiling
Data profiling types
Data profiling techniques
Practical exercise
Step 1 – Discovery
Step 2 – Structuring
Step 3 – Cleaning
Step 4 – Enriching
Step 5 – Validating
Step 6 – Publishing
Summary
Part 2: Data Wrangling Techniques Using SQL
3
Data Wrangling on String Data Types
SQL data types
Numeric data types
Date and time data types
String data type
SQL string functions
RIGHT()
LEFT()
LEN()
TRIM()
RTRIM()
LTRIM()
RPAD()
LPAD()
REPLACE()
REVERSE()
SUBSTRING()
CAST()
CONCATENATE()
CONCATENATE_WS()
UPPER function
LOWER function
INITCAP function
INSTR function
Summary
Practical exercises
Practical exercise 1
Practical exercise 2
Practical exercise 3
Practical exercise 4
4
Data Wrangling on the DATE Data Type
SQL DATE data type functions
EXTRACT
DATEDIFF()
TIMEDIFF()
DATE_ADD()
DATE_SUB()
DATE_FORMAT()
STR_TO_DATE()
Extracting the current date and time
Summary
5
Handling NULL Values
The impact of missing data and NULL values on data analysis
Understanding the importance of data validation and cleaning before analyzing data
Identifying NULL/missing values
NULL values versus zero values
Using the IS NULL and IS NOT NULL operators to filter and select data with NULL values
IS NULL() and IS NOT NULL() – scenario
Using the COALESCE and IFNULL functions to replace NULL values with a default value
IFNULL()
COALESCE()
IS NULL versus = NULL
Summary
6
Pivoting Data Using SQL
SQL Transpose – rows to columns
Use case scenario
SQL Cross Tab – columns to rows
Use case scenario
Unpivoting data in SQL
Analytical workflow – from SQL to business intelligence – transforming data into actionable insights
Summary
Part 3: SQL Subqueries, Aggregate And Window Functions
7
Subqueries and CTEs
Introduction to subqueries
Simple subqueries
Correlated subqueries
Using subqueries in SELECT statements
Using subqueries in FROM statements
Using subqueries in WHERE statements
Nested subqueries
Correlated subqueries
Using subqueries in INSERT, UPDATE, and DELETE statements
Managing and maintaining subqueries
Common table expressions
Performance considerations for subqueries and CTEs
Subquery versus CTEs
Summary
8
Aggregate Functions
Overview of aggregate functions in SQL
Using GROUP BY
COUNT()
SUM()
AVG()
MIN() and MAX()
COUNT(DISTINCT)
Case scenario – using all aggregate functions
Summary
9
SQL Window Functions
The importance of SQL window functions
SQL aggregate functions
SQL window functions versus aggregate functions
Window functions versus aggregate functions – an example to illustrate the differences
Window functions
SUM()
COUNT()
AVG()
ROW_NUMBER()
RANK() and DENSE_RANK()
Lead() and Lag()
NTILE()
Summary
Part 4: Optimizing Query Performance
10
Optimizing Query Performance
Introduction to query optimization
Query execution plan
Query optimization techniques
Example
Caching
Normalization
Query monitoring and troubleshooting
Query profiling
Query logging
Database monitoring
Tips and tricks for writing efficient queries
Summary
In the next chapter, we will learn about descriptive statistics using SQL, which will provide us with insights into the distribution, central tendency, and variability of data, which can, in turn, help us identify outliers and anomalies. Common SQL functions and statements used for descriptive statistics include COUNT, AVG, MIN, MAX, and GROUP BY. By using SQL to analyze data, researchers and analysts can efficiently extract and summarize information from large datasets.
Part 5:Data Science And Wrangling
11
Descriptive Statistics with SQL
Calculating descriptive statistics with SQL
Mean
Median
Mode
Standard deviation
Variance
Variability
Summary
In the next chapter, we will learn how SQL can be used for time series analysis.
12
Time Series with SQL
Running totals
Case scenario
Lead and lag for time series analysis
Case scenario
Key KPIs
Percentage change
Case scenario
Key KPIs
Moving averages
Case scenario
Key KPIs
Rank for time series analysis
Case scenario
Key KPIs
CTE for time series analysis
Importance of using CTEs while performing time series analysis
Forecasting with linear regression
Case scenario
Key KPIs
Summary
In the next chapter, we will learn different methods to find outliers in the data easily. Outlier detection is an important aspect of data analysis as it helps determine if the data is correct, looks at the skewness of the data, and removes any unexpected values.
13
Outlier Detection
Measures of central tendency and dispersion
Case scenario
Key KPIs
Methods for detecting...


Kandarpa Raghav:
Raghav Kandarpa is an experienced Data Scientist in Finance and logistics industry with expertise in SQL, Python, Building Machine Learning Models, Financial Data Modelling, and Statistical Analysis. He holds a Masters' degree in Business Analytics specializing in Data Science from the University of Texas at Dallas.Saxena Shivangi:
Shivangi Saxena is an experienced BI Engineer with proficiency in SQL, Data Visualization, and Statistical Analysis. She holds a master's degree in Information Technology and Management from the University of Texas at Dallas. She has several years of experience building several BI tools and products using SQL and BI reporting tools which has helped stakeholders to get visibility to the right data points



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.