E-Book, Englisch, 421 Seiten, eBook
Swinbank Azure Data Factory by Example
2. Auflage 2024
ISBN: 979-8-8688-0218-8
Verlag: APRESS
Format: PDF
Kopierschutz: 1 - PDF Watermark
Practical Implementation for Data Engineers
E-Book, Englisch, 421 Seiten, eBook
ISBN: 979-8-8688-0218-8
Verlag: APRESS
Format: PDF
Kopierschutz: 1 - PDF Watermark
This edition, updated for 2024, includes the latest developments to the Azure Data Factory service:
- Enhancements to existing pipeline activities such as Execute Pipeline, along with the introduction of new activities such as Script, and activities designed specifically to interact with Azure Synapse Analytics.
- Improvements to flow control provided by activity deactivation and the Fail activity.
- The introduction of reusable data flow components such as user-defined functions and flowlets.
- Extensions to integration runtime capabilities including Managed VNet support.
- The ability to trigger pipelines in response to custom events.
- Tools for implementing boilerplate processes such as change data capture and metadata-driven data copying.
What You Will Learn
- Create pipelines, activities, datasets, and linked services
- Build reusable components using variables, parameters, and expressions
- Move data into and around Azure services automatically
- Transform data natively using ADF data flows and Power Query data wrangling
- Master flow-of-control and triggers for tightly orchestrated pipeline execution
- Publish and monitor pipelines easily and with confidence
Who This Book Is For
Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations
Zielgruppe
Professional/practitioner
Autoren/Hrsg.
Weitere Infos & Material
Chapter 1: Creating an Azure Data Factory Instance Get Started in AzureCreate a Free Azure Account Explore the Azure Portal Create a Resource Group Create an Azure Data Factory ....................................................................................................... 7Explore the Azure Data Factory User Experience .......................................................................... 9Navigation Header Bar ........................................................................................................... 10Navigation Sidebar ................................................................................................................ 11Link to a Git Repository ............................................................................................................... 12Create a Git Repository in Azure Repos ................................................................................. 13Link the Data Factory to the Git Repository ........................................................................... 15The ADF UX as a Web-Based IDE ................................................................................................ 17Chapter Review ........................................................................................................................... 19Key Concepts ......................................................................................................................... 20For SSIS Developers .............................................................................................................. 22
Chapter 2: Your First Pipeline ................................................................................. 23Work with Azure Storage ............................................................................................................ 23Create an Azure Storage Account .......................................................................................... 23Explore Azure Storage ........................................................................................................... 26Upload Sample Data .............................................................................................................. 27Use the Copy Data Tool ............................................................................................................... 28Explore Your Pipeline .................................................................................................................. 32Linked Services ..................................................................................................................... 33Datasets ................................................................................................................................ 34Pipelines ................................................................................................................................ 35Activities ................................................................................................................................ 36Integration Runtimes ............................................................................................................. 37Factory Resources in Git ....................................................................................................... 39Debug Your Pipeline .................................................................................................................... 40Run the Pipeline in Debug Mode ........................................................................................... 41Inspect Execution Results ..................................................................................................... 42Chapter Review ........................................................................................................................... 42Key Concepts ......................................................................................................................... 42For SSIS Developers .............................................................................................................. 44
Chapter 3: The Copy Data Activity .......................................................................... 45Prepare an Azure SQL Database ................................................................................................. 45Create the Database .............................................................................................................. 46Create Database Objects ....................................................................................................... 49Import Structured Data into Azure SQL DB ................................................................................. 51Create the Basic Pipeline ...................................................................................................... 51Process Multiple Files ........................................................................................................... 59Truncate Before Load ............................................................................................................ 61Map Source and Sink Schemas .................................................................................................. 62Create a New Source Dataset ................................................................................................ 63Create a New Pipeline ........................................................................................................... 64Configure Schema Mapping .................................................................................................. 65Import Semi-structured Data into Azure SQL DB ........................................................................ 67Create a JSON File Dataset ................................................................................................... 67Create the Pipeline ................................................................................................................ 68Configure Schema Mapping .................................................................................................. 68Set the Collection Reference ................................................................................................. 69The Effect of Schema Drift .................................................................................................... 70Understanding Type Conversion ............................................................................................ 72Transform JSON Files into Parquet ............................................................................................. 73Create a New JSON Dataset .................................................................................................. 74Create a Parquet Dataset ....................................................................................................... 74Create and Run the Transformation Pipeline ......................................................................... 75Performance Settings ................................................................................................................. 76Data Integration Unit ............................................................................................................. 76Degree of Copy Parallelism ................................................................................................... 77Chapter Review ........................................................................................................................... 77Key Concepts ......................................................................................................................... 78Azure Data Factory User Experience (ADF UX) ...................................................................... 79For SSIS Developers .............................................................................................................. 81
Chapter 4: Expressions ........................................................................................... 83Explore the Expression Builder ................................................................................................... 83Use System Variables .................................................................................................................. 86Enable Storage of Audit Information ...................................................................................... 86Create a New Pipeline ........................................................................................................... 86Add New Source Columns ..................................................................................................... 86Run the Pipeline .................................................................................................................... 87Access Activity Run Properties ................................................................................................... 88Create Database Objects ....................................................................................................... 89Add Stored Procedure Activity ............................................................................................... 90Run the Pipeline .................................................................................................................... 93Use the Lookup Activity ............................................................................................................... 94Create Database Objects ....................................................................................................... 94Configure the Lookup Activity ................................................................................................ 96Use Breakpoints .................................................................................................................... 98Use the Lookup Value .......................................................................................................... 100Update the Stored Procedure Activity .................................................................................. 100Run the Pipeline .................................................................................................................. 101User Variables ........................................................................................................................... 102Create a Variable ................................................................................................................. 102Set a Variable ...................................................................................................................... 103Use the Variable ................................................................................................................... 104Array Variables .................................................................................................................... 105Concatenate Strings .................................................................................................................. 106Infix Operators ..................................................................................................................... 107String Interpolation .............................................................................................................. 107Escaping @ .......................................................................................................................... 108Chapter Review ......................................................................................................................... 108Key Concepts ....................................................................................................................... 108For SSIS Developers ............................................................................................................ 110
Chapter 5: Parameters .......................................................................................... 113Set Up an Azure Key Vault ......................................................................................................... 113Create a Key Vault ............................................................................................................... 114Create a Key Vault Secret .................................................................................................... 115Grant Access to the Key Vault .............................................................................................. 116Create a Key Vault ADF Linked Service ................................................................................ 118Create a New Storage Account Linked Service ................................................................... 119Use Dataset Parameters ........................................................................................................... 121Create a Parameterized Dataset .......................................................................................... 123Use the Parameterized Dataset ........................................................................................... 124Reuse the Parameterized Dataset ....................................................................................... 126
Chapter 6: Controlling Flow .................................................................................. 145Create a Per-File Pipeline .......................................................................................................... 145Use Activity Dependency Conditions ......................................................................................... 147Explore Dependency Condition Interactions ........................................................................ 149Understand Pipeline Outcome ............................................................................................. 152Raise Errors .............................................................................................................................. 156Use Conditional Activities .......................................................................................................... 157Divert Error Rows ................................................................................................................ 157Load Error Rows .................................................................................................................. 161Understand the Switch Activity ........................................................................................... 165Use Iteration Activities .............................................................................................................. 167Use the Get Metadata Activity ............................................................................................. 167Use the ForEach Activity ...................................................................................................... 169Ensure Parallelizability ........................................................................................................ 172Understand the Until Activity ............................................................................................... 175
Chapter 7: Data Flows .......................................................................................... 181Build a Data Flow ...................................................................................................................... 181Enable Data Flow Debugging .............................................................................................. 182Add a Data Flow Transformation ......................................................................................... 184Use the Filter Transformation .............................................................................................. 188Use the Lookup Transformation ........................................................................................... 191Use the Derived Column Transformation ............................................................................. 194Use the Select Transformation ............................................................................................ 196Use the Sink Transformation ............................................................................................... 197Execute the Data Flow ......................................................................................................... 198Maintain a Product Dimension .................................................................................................. 202Create a Dimension Table .................................................................................................... 203Create Supporting Datasets ................................................................................................. 203Build the Product Maintenance Data Flow .......................................................................... 204Execute the Dimension Data Flow ....................................................................................... 210Chapter Review ......................................................................................................................... 212Key Concepts ....................................................................................................................... 212For SSIS Developers ............................................................................................................ 214Chapter 8: Integration Runtimes .......................................................................... 217Azure Integration Runtime ........................................................................................................ 217Inspect the AutoResolveIntegrationRuntime ....................................................................... 218Create a New Azure Integration Runtime ............................................................................ 219Use the New Azure Integration Runtime .............................................................................. 221Self-Hosted Integration Runtime ............................................................................................... 224Create a Shared Data Factory ............................................................................................. 225Create a Self-Hosted Integration Runtime ........................................................................... 225Link to a Self-Hosted Integration Runtime .......................................................................... 226
Chapter 9: Power Query in ADF ............................................................................ 241Create a Power Query Mashup ................................................................................................. 241Explore the Power Query Editor ................................................................................................ 243Wrangle Data ............................................................................................................................ 245Run the Power Query Activity ................................................................................................... 248Chapter Review ......................................................................................................................... 250
Chapter 10: Publishing to ADF .............................................................................. 253Publish to Your Factory Instance ............................................................................................... 254Trigger a Pipeline from the ADF UX ..................................................................................... 254Publish Factory Resources .................................................................................................. 255Inspect Published Pipeline Run Outcome ............................................................................ 256Publish to Another Data Factory ............................................................................................... 257Prepare a Production Environment ...................................................................................... 257Export ARM Template from Your Development Factory ....................................................... 259Import ARM Template into Your Production Factory ............................................................ 260Understand Deployment Parameters ................................................................................... 262Automate Publishing to Another Factory .................................................................................. 263
Chapter 11: Triggers ............................................................................................. 281Use a Schedule Trigger ............................................................................................................. 281Create a Schedule Trigger ................................................................................................... 281Reuse a Trigger .................................................................................................................... 283Inspect Trigger Definitions ................................................................................................... 284Publish the Trigger ............................................................................................................... 285Monitor Trigger Runs ........................................................................................................... 286Deactivate the Trigger .......................................................................................................... 287Advanced Recurrence Options ............................................................................................ 288Use an Event-Based Trigger ...................................................................................................... 289Register the Event Grid Resource Provider .......................................................................... 290Create an Event-Based Trigger ............................................................................................ 291Cause the Trigger to Run ..................................................................................................... 293Trigger-Scoped System Variables ........................................................................................ 295Use a Tumbling Window Trigger ................................................................................................ 296Prepare Data ........................................................................................................................ 296Create a Windowed Copy Pipeline ....................................................................................... 297Create a Tumbling Window Trigger ...................................................................................... 299Monitor Trigger Runs ........................................................................................................... 299Advanced Features .............................................................................................................. 301Publishing Triggers Automatically ............................................................................................. 302Triggering Pipelines Programmatically ..................................................................................... 303
Ch 12. Change Monitoring




