1.1. In the beginning: how we got to where we are today
Those who cannot remember the past, are condemned to repeat it.
- George Santayana (1863–1952)
To best understand the state of enterprise data management (EDM) today, it’s important to understand how we arrived at this point during a journey that dates back nearly 50 years to the days when enormous, expensive mainframe computers were the backbone of “data processing” (as Information Technology was commonly referred to long ago) and computing technology was still in its adolescence.
1.1.1. 1960s and 1970s
Many data processing textbooks of the 1960s and 1970s proposed a vision much like that depicted in
Figure 1.1.
Fig. 1.1 1960s/1970s vision of a common “data base.”
The simplified architecture envisioned by many prognosticators called for a single common “data base”
1 that would provide a single primary store of data for core business applications such as accounting (general ledger, accounts payable, accounts receivable, payroll, etc.), finance, personnel, procurement, and others. One application might write a new record into the data base that would then be used by another application.
In many ways, this “single data base” vision is similar to the capabilities offered today by many enterprise systems vendors in which a consolidated store of data underlies enterprise resource planning (ERP), customer relationship management (CRM), supply chain management (SCM), human capital management (HCM), and other applications that have touch-points with one another. Under this architecture the typical company or governmental agency would face far fewer conflicting data definitions and semantics; conflicting business rules; unnecessary data duplication; and other hindrances than what is found in today’s organizational data landscape.
Despite this vision of a highly ordered, quasi-utopian data management architecture, the result for most companies and governmental agencies looked far more like the diagram in
Figure 1.2, with each application “owning” its own file systems, tapes, and first-generation database management systems (DBMSs).
Fig. 1.2 The reality of most 1960s/1970s data environments.
Even when an organization’s portfolio of applications was housed on a single mainframe, the vision of a shared pool of data among those applications was typically nowhere in the picture. However, the various applications – many of which were custom-written in those days – still needed to share data among themselves. For example, Accounts Receivable and Accounts Payable applications needed to feed data into the General Ledger application. Most organizations found themselves rapidly slipping into the “spider’s web quagmire” of numerous one-by-one data exchange interfaces as depicted in
Figure 1.3.
Fig. 1.3 Ungoverned data integration via proliferating one-by-one interfaces.
By the time the 1970s drew to a close and computing was becoming more and more prevalent within business and government, any vision of managing one’s data assets at an enterprise level was far from a reality for most organizations. Instead, a world of uncoordinated, often conflicting data silos was what we were left with.
1.1.2. 1980s
As the 1980s progressed, the data silo problem actually began to worsen. Minicomputers had been introduced in the 1960s and had grown in popularity during the 1970s, led by vendors such as Digital Equipment Corporation (DEC) and Data General. Increasingly, the fragmentation of both applications and data moved from the realm of the mainframe into minicomputers as organizations began deploying core applications on these newer, smaller-scale platforms. Consequently, the one-by-one file transfers and other types of data exchange depicted in
Figure 1.3 were now increasingly occurring across hardware, operating system platforms, and networks, many of which were only beginning to “talk” to one another. As the 1980s proceeded and personal computers (often called “microcomputers” at the time) grew wildly in popularity, the typical enterprise’s data architecture grew even more fragmented and chaotic.
Many organizations realized that they now were facing a serious problem with their fragmented data silos, as did many of the leading technology vendors. Throughout the 1980s, two major approaches took shape in an attempt to overcome the fragmentation problem:
• Enterprise data models
• Distributed database management systems (DDBMSs)
1.1.2.1. Enterprise Data Models
Companies and governmental agencies attempted to get their arms around their own data fragmentation problems by embarking on enterprise data model initiatives. Using conceptual and logical data modeling techniques that began in the 1970s such as entity-relationship modeling, teams of data modelers would attempt to understand and document the enterprise’s existing data elements and attributes as well as the details of relationships among those elements. The operating premise governing these efforts was that by investing the time and resources to analyze, understand, and document all of the enterprise’s data across any number of barriers – application, platform, and organizational, in particular – the “data chaos” would begin to dissipate and new systems could be built leveraging the data structures, relationships, and data-oriented business rules that already existed.
While many enterprise data modeling initiatives did produce a better understanding of an organization’s data assets than before a given initiative had begun, these efforts largely withered over time and tended not to yield anywhere near the economies of scale originally envisioned at project inception. The application portfolio of the typical organization in the 1980s was both fast-growing and very volatile, and an enterprise data modeling initiative almost certainly fell behind new and rapidly changing data under the control of any given application or system. The result even before completion, most enterprise data models became “stale” and outdated, and were quietly mothballed.
(As most readers know, data modeling techniques are still widely used today, although primarily as part of the up-front analysis and design phase for a specific software development or package implementation project rather than attempting to document the entire breadth of an enterprise’s data assets.)
1.1.2.2. Distributed Database Management Systems (DDBMSs)
Enterprise data modeling efforts on the parts of companies and governmental agencies were primarily an attempt to understand an organization’s highly fragmented data. The data models themselves did nothing to help facilitate the integration of data across platforms, databases, organizational boundaries, etc.
To address the data fragmentation problem from an integration perspective, most of the leading computer companies and database vendors of the 1980s began work on
DDBMSs. The specific technical approaches from companies such as IBM (Information Warehouse), Digital Equipment Corporation (RdbStar), Ingres (Ingres Star), and others varied from one to another, but the fundamental premise of most DDBMS efforts was as depicted in
Figure 1.4.
Fig. 1.4 The DDBMS concept.
The DDBMS story went like this: regardless of how scattered an organization’s data might be, a single data model-driven interface could sit between applications and end-users and the underlying databases, including those from other vendors operating under different DBMSs (#2 and #3 in
Figure 1.4). The DDBMS engine would provide
location and platform transparency to abstract applications and users from the underlying data distribution and heterogeneity, and
both read-write access as well as read-only access to the...