Nwokwu | Data Engineering for Beginners | Buch | 978-1-394-32541-2 | www.sack.de

Buch, Englisch, 384 Seiten, Format (B × H): 183 mm x 229 mm, Gewicht: 748 g

Nwokwu

Data Engineering for Beginners


1. Auflage 2025
ISBN: 978-1-394-32541-2
Verlag: Wiley

Buch, Englisch, 384 Seiten, Format (B × H): 183 mm x 229 mm, Gewicht: 748 g

ISBN: 978-1-394-32541-2
Verlag: Wiley


A hands-on technical and industry roadmap for aspiring data engineers

In Data Engineering for Beginners, big data expert Chisom Nwokwu delivers a beginner-friendly handbook for everyone interested in the fundamentals of data engineering. Whether you're interested in starting a rewarding, new career as a data analyst, data engineer, or data scientist, or seeking to expand your skillset in an existing engineering role, Nwokwu offers the technical and industry knowledge you need to succeed.

The book explains: - Database fundamentals, including relational and noSQL databases
- Data warehouses and data lakes
- Data pipelines, including info about batch and stream processing
- Data quality dimensions
- Data security principles, including data encryption
- Data governance principles and data framework
- Big data and distributed systems concepts
- Data engineering on the cloud
- Essential skills and tools for data engineering interviews and jobs

Data Engineering for Beginners offers an easy-to-read roadmap on a seemingly complicated and intimidating subject. It addresses the topics most likely to cause a beginning data engineer to stumble, clearly explaining key concepts in an accessible way. You'll also find: - A comprehensive glossary of data engineering terms
- Common and practical career paths in the data engineering industry
- An introduction to key cloud technologies and services you may encounter early in your data engineering career

Perfect for practicing and aspiring data analysts, data scientists, and data engineers, Data Engineering for Beginners is an effective and reliable starting point for learning an in-demand skill. It's a powerful resource for everyone hoping to expand their data engineering Skillset and upskill in the big data era.

Nwokwu Data Engineering for Beginners jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


Foreword xxi

Introduction xxiii

Chapter 1 Understanding Data 1

A Brief History of Data 2

Data in 19,000 bce: The Great Baboon and Abacus 2

Data in the 1600s: Public Health Statistics 2

Data in the 1800s: The U.S. Census 3

Data in the 1900s: The Concept of Storage 3

Data in the 1990s: Data and the Internet 4

Types of Data 4

Structured Data 4

Unstructured Data 5

Semi-structured Data 6

Why Is Data Important? 7

Healthcare 7

Supply Chain 8

Transportation and Logistics 8

Artificial Intelligence 9

Data and Information 9

Summary 10

Notes 11

Chapter 2 Introduction to Data Engineering 13

Data Engineering Explained Using an Oil Refinery Analogy 14

An Overview of the Data Engineering Life Cycle 15

Data Storage 16

Data Ingestion 20

Data Transformation 21

Data Serving 22

Navigating Project Requirements, Engaging Stakeholders, and Delivering Business Value 24

Requirements Gathering 24

Understanding Stakeholders 24

Understanding System Requirements 26

Delivering Business Value 28

The Current State of Data Engineering 28

The Importance of Data Engineering 29

Summary 30

Chapter 3 Database Fundamentals 33

Key Concepts of Databases 34

Rows 34

Columns 34

Schema 35

Keys 35

Types of Databases 35

Relational Databases 36

NoSQL Databases 47

Choosing Between Relational and NoSQL Databases 55

Start With Your Data’s Structure 55

Think About the Relationships in Your Data 55

How Fast Do You Need to Move? 55

How Do You Need to Query Your Data? 55

Scaling and Performance 56

Transaction and Strong Consistency Needs 56

Summary 56

Chapter 4 SQL Fundamentals 59

Introduction to SQL 60

Basic SQL Clauses 60

Comparison Operators 62

LIKE Statement 63

IN Statement 64

BETWEEN Statement 64

AND Statement 65

OR Statement 65

NOT Statement 66

IS NULL and IS NOT NULL Statements 66

Sorting and Limiting 67

Aggregate Functions 68

Sum() 69

Avg() 69

MAX() and MIN() 69

Group by 70

Having 71

Understanding Joins 72
Inner Join 72

Left Join 73

Right Join 74

Full Outer Join 75

Subqueries 76

Common Table Expressions (CTEs) 77

Set Operations 78

Window Functions 80

Lab: Setting Up SQL Server and Running SQL Queries 85

Best Practices for Writing Efficient SQL Queries 87

Summary 88

Chapter 5 Database Design 91

Data Modeling 92

Why Do We Need to Model Data? 92

Types of Data Modeling 93

Normalization 100

Rules of Normalization 102

Downsides of Normalization 109

Denormalization 110

Data Modeling Best Practices 111

Define the Grain 111

Normalize Now, Denormalize Later 112

Choose the Right Data Types 112

Proper Naming Conventions 113

Database Optimization 114

Indexing 114

Partitioning 115

Sharding 116

Views 118

Summary 120

Chapter 6 Data Warehouses, Data Lakes, and Data Lakehouses 123

Data Warehouses 124

Extract, Transform, and Load (ETL) 126

Schema Design 127

Snowflake Schema 132

Slowly Changing Dimensions 134

Data Marts 138

Benefits of a Data Mart 138

Challenges with Data Marts 138

Data Lakes 139

How Do Data Lakes Work? 139

Challenges of Data Lakes 142

Data Lakehouse 142

Features of a Data Lakehouse 143

Data Lakehouse Architecture 143

The Key Differences Between a Database, Data Warehouse, Data Lake, and Data Lakehouse 144

Summary 145

Chapter 7 Data Pipelines 147

Batch Pipelines 148

Components of a Batch Pipeline 148

ETL Pipelines vs. ELT Pipelines 151

Stream Pipelines 152

How Would This Work? 152

Components of a Streaming Data Pipeline 153

Lambda Architecture 164

Components of the Lambda Architecture 165

Advantages of the Lambda Architecture 166

Challenges and Trade-offs 166

Data Orchestration 167

Directed Acyclic Graphs (DAGs) 168

Scheduling and Automation 170

Monitoring 171

Alerts 172

Lab: Building an ETL Pipeline and Automating with Apache Airflow 173

Requirements 174

Set Up Your Development Environment 174

Extracting Data from CSV 176

Transforming the Data 177

Load the New CSV File into a Postgres Database Instance 181

Schedule ETL Pipeline with Apache Airflow 182

Summary 185

Chapter 8 Data Quality 187

Bad Data 188

Dimensions of Data Quality 190

Accuracy 191

Completeness 191

Consistency 194

Validity 195

Uniqueness 196

Timeliness 198

Accessibility 198

Relevance 198

Data Quality Hierarchy 199

Data Quality Best Practices 200

Summary 201

Chapter 9 Data Security 203

What Is Data Security? 204

Common Threats to Data Security 205

Core Principles of Data Security 206

Confidentiality 206

Integrity 207

Availability 208

Data Encryption 209

Symmetric Encryption 209

Asymmetric Encryption 210

Data Masking 211

Understanding Network Security 214

Access Control 216

Authentication 217

Authorization 219

The Principle of Least Privilege 222

Access Levels 224

Secrets Management 225

Data Security and Data Privacy 225

Summary 226

Chapter 10 Data Governance 229

How to Think About Data Governance 230

Data Governance Framework 232

Policies 233

Regulatory Compliance Policy 234

Data Classification Policy 238

Data Retention and Disposal Policy 239

Data Sharing Policy 240

Processes 241

Metadata Management 242

Data Lineage 244

Incident Management 244

Master Data Management 246

Roles in the Data Governance Framework 247

Data Owner 248

Data Steward 248

Data Custodian 249

Chief Data Officer (CDO) 249

Data Management and Data Governance 250

Summary 250

Chapter 11 Big Data and Distributed Systems 253

The Five V’s of Big Data 254

Volume 255

Velocity 255

Variety 255

Veracity 256

Value 256

Distributed Systems 256

Scalability 258

Fault Tolerance 259

Reliability 260

Concurrency 260

Resource Management 260

Consistency 261

Availability 261

Load Balancing 261

Latency 262

Distributed Data Processing 262

Apache Hadoop 262

Big Data File Types 272

Avro 272

Parquet 273

Optimized Row Columnar (ORC) 274

Choosing the File Type 275

Summary 276

Chapter 12 Data Engineering on the Cloud 279

Cloud Computing 280

On-Premises 281

Cloud 281

Making the Right Choice 282

Core Cloud Concepts 282

Storage 282

Compute 286

Networking 287

Cloud Service Models 291

Infrastructure as a Service 291

Platform as a Service 292

Software as a Service 293

Choosing Between IaaS, PaaS, and SaaS 294

A Hybrid Approach 298

Cloud Management Models 298

Serverless 299

Managed 300

Self-Managed 301

Putting It All Together 302

Cost Optimization 302

Understanding Cloud Pricing Models 302

Rightsizing Resources 303

Smart Job Scheduling 304

Storage Optimization 304

Shutting Down Idle Resources 304

Use Serverless Where Possible 304

Monitoring and Alerting 305

Summary 305

Chapter 13 Building a Career in Data Engineering 307

Types of Data Engineering Roles 308

Types of Data Engineers 308

Platform Data Engineer 308

Analytics Data Engineer 310

AI/ML Data Engineers 310

Landing Your First Data Engineering Role 312

A Typical Data Engineering Job Description 312

How to Build a Winning Résumé 314

Preparing for a Data Engineering Interview 316

Thinking Like a Data Engineer 321

Think in Systems 321

Learn to Prioritize Data Quality 321

Design for Failure 321

Balance Business Context with Technical Choices 322

Optimize for Clarity, Then Speed 322

Think Beyond the Tool 322

Master Automation 322

Summary 323

Appendix Sample Interview Questions 325

SQL 325

Data Modeling 328

Data Pipelines 330

Apache Spark 332

System Design 333

Data Engineering Glossary 335

Index 347


CHISOM NWOKWU, is a Big-Data Engineer, Multi-Published Author, and Creator specialising in the design and development of scalable data platforms for teams. She’s an Azure Certified Data Engineer Associate who has worked with large international firms, including Microsoft and Bank of America.



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.