E-Book, Englisch, 374 Seiten
Gioia, Andrea / Scotti, Giulio Managing Data as a Product
1. Auflage 2024
ISBN: 978-1-83546-937-8
Verlag: De Gruyter
Format: EPUB
Kopierschutz: 0 - No protection
Design and build data-product-centered socio-technical architectures
E-Book, Englisch, 374 Seiten
ISBN: 978-1-83546-937-8
Verlag: De Gruyter
Format: EPUB
Kopierschutz: 0 - No protection
Traditional monolithic data platforms struggle with scalability and burden central data teams with excessive cognitive load, leading to challenges in managing technological debt. As maintenance costs escalate, these platforms lose their ability to provide sustained value over time. With two decades of hands-on experience implementing data solutions and his pioneering work in the Open Data Mesh Initiative, Andrea Gioia brings practical insights and proven strategies for transforming how organizations manage their data assets.
Managing Data as a Product introduces a modular and distributed approach to data platform development, centered on the concept of data products. In this book, you'll explore the rationale behind this shift, understand the core features and structure of data products, and learn how to identify, develop, and operate them in a production environment. The book guides you through designing and implementing an incremental, value-driven strategy for adopting data product-centered architectures, including strategies for securing buy-in from stakeholders. It also covers data modeling in distributed environments and its role in enabling modern generative AI.
By the end of this book, you'll understand product-centric data architecture and how to adopt it.
*Email sign-up and proof of purchase required
Autoren/Hrsg.
Weitere Infos & Material
Preface
Hello, and welcome to ! I’m excited to share everything I’ve learned about managing data as a product and how this new paradigm can solve recurrent problems in data architectures that, despite huge investment, periodically collapse under the weight of their own complexity, making sustainable evolution a real challenge.
Ironically, the most successful data platforms, those that bring the greatest value to an organization, are often the first to struggle. Their success drives rapid growth in both the number of managed data assets and users, which leads to complexity. This complexity gradually slows down their growth until the platforms become too costly to maintain and too slow to evolve. However, this march toward self-destruction isn’t inevitable. We can rethink how we design data management solutions, so they don’t fall victim to their success but instead exploit it, multiplying the value they generate for the organization while growing.
Managing data as a product allows us to handle growing complexity by modularizing the data management architecture. Each data product is a modular unit that helps isolate complexity into smaller, manageable parts. Over time, the collection of developed data products forms a portfolio of building blocks that can be easily recombined to support new use cases. This way, while the platform’s complexity remains stable as it grows, the value derived from the managed data assets increases. Implementing new business cases becomes simpler, as existing data products can be reused rather than creating new ones from scratch.
However, managing data as a product is a profound paradigm shift from traditional monolithic data architectures, impacting not only technology but also, and especially, the organization. Throughout this book, chapter by chapter, we’ll explore practical, actionable steps to adopt this new paradigm, addressing all key aspects from both a technical and organizational perspective.
As we’ll see, adopting a data-as-a-product approach is challenging, but it’s well worth the effort. This book is a travel guide inspired by my experience, aimed at helping you find the best path for your unique context to successfully navigate this paradigm shift.
What this book covers
, , shows how modularizing data architecture with data products solves recurring problems that make its sustainable evolution challenging over time.
, , defines what a data product is, outlining its key characteristics and explaining the essential components that make it up, highlighting how each element contributes to its overall function and value.
, , explores the foundational principles of a data product-centered architecture, analyzing the key operational and organizational capabilities required to manage it. We also compare other modern approaches such as data meshes and data fabrics with the data-as-product paradigm to highlight their similarities and key differences.
, , explains how to identify and prioritize data products using a value-driven approach. It starts by identifying relevant business cases through domain-driven design and event storming, then shows how to define the data products needed to support those business cases.
, , explores the process of designing a data product based on identified requirements, starting with techniques for defining scope, interfaces, and ecosystem relationships. It then examines the core components of a data product, their development process, and how to describe them with machine-readable documents. Finally, it analyzes the data flow, focusing on components responsible for sourcing, processing, and serving data.
, , covers the entire lifecycle of a data product, from release to decommissioning. It introduces CI/CD methodologies, explores managing a data product in production with a focus on governance, observability, and access control, and discusses techniques for evolving and reusing data products in a distributed environment.
, , explains how to speed up the adoption of a data product-centric paradigm by creating a self-serve platform to mobilize the entire data ecosystem. It covers the platform’s main features, how it improves the experience for developers, operators, and consumers, and the key factors in deciding whether to build, buy, or use a hybrid approach in implementing it.
, , covers the adoption of the data-as-a-product paradigm. It outlines the key phases of the process, exploring objectives, challenges, and activities for each stage. Finally, it discusses how to create a flexible data strategy that evolves with each phase, building on previous learnings.
, , explains how to design an organizational structure for managing data as a product. It introduces the team topologies framework, including team types and interaction modes, and explores how to organize teams for efficient data product delivery. Finally, it looks at how to integrate these teams into the organization and decide between the centralized or decentralized data management model.
, , examines data modeling in a decentralized, data product-centered architecture. It defines data models and emphasizes intentionality in modeling, then examines physical modeling techniques for distributed environments. Finally, it covers conceptual data modeling and its role in guiding the design and evolution of data products within a cohesive ecosystem.
, , explores how to build an information architecture that maximizes the value of managed data, starting with developed data products. It covers how different planes of the information architecture add context to data and focuses especially on the knowledge plane, where shared conceptual models ensure semantic interoperability between data products. Finally, it explores how federated modeling teams can create and link conceptual models to physical data, forming an enterprise knowledge graph crucial for unlocking the potential of generative AI.
, , revisits key concepts from earlier chapters, tying them to the core beliefs about data management that inspired this book. It wraps up with practical advice for becoming a more successful data management practitioner.
To get the most out of this book
In this book, both data products and the self-serve platform needed to support their development and operation are described at a logical level, without reference to any specific technology stack. Therefore, no prior knowledge of specific technologies is required to read and understand the content.
In some chapters, examples of metadata are provided to describe the components of a data product. This metadata is generally represented as JSON snippets. To use and modify them, we suggest a text editor that can recognize JSON syntax, such as Visual Studio Code.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Download the example code files
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Managing-Data-as-a-Product. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “The Promises object contains all the metadata through which the data product declares the intent of the port.”
A block of code is set as follows:
:purchases a rdf:Property ; rdfs:domain :Customer ; rdfs:range [ a owl:Class ; owl:unionOf ( :Product :Service ) ]Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or...




