E-Book, Englisch, 379 Seiten
Tillmann Usage-Driven Database Design
1. ed
ISBN: 978-1-4842-2722-0
Verlag: Apress
Format: PDF
Kopierschutz: 1 - PDF Watermark
From Logical Data Modeling through Physical Schema Definition
E-Book, Englisch, 379 Seiten
ISBN: 978-1-4842-2722-0
Verlag: Apress
Format: PDF
Kopierschutz: 1 - PDF Watermark
Design great databases-from logical data modeling through physical schema definition. You will learn a framework that finally cracks the problem of merging data and process models into a meaningful and unified design that accounts for how data is actually used in production systems.
Key to the framework is a method for taking the logical data model that is a static look at the definition of the data, and merging that static look with the process models describing how the data will be used in actual practice once a given system is implemented. The approach solves the disconnect between the static definition of data in the logical data model and the dynamic flow of the data in the logical process models.
The design framework in this book can be used to create operational databases for transaction processing systems, or for data warehouses in support of decision support systems. The information manager can be a flat file, Oracle Database, IMS, NoSQL, Cassandra, Hadoop, or any other DBMS.
Usage-Driven Database Design emphasizes practical aspects of design, and speaks to what works, what doesn't work, and what to avoid at all costs. Included in the book are lessons learned by the author over his 30+ years in the corporate trenches. Everything in the book is grounded on good theory, yet demonstrates a professional and pragmatic approach to design that can come only from decades of experience.Presents an end-to-end framework from logical data modeling through physical schema definition.
Includes lessons learned, techniques, and tricks that can turn a database disaster into a success.
Applies to all types of database management systems, including NoSQL such as Cassandra and Hadoop, and mainstream SQL databases such as Oracle and SQL ServerWhat You'll LearnCreate logical data models that accurately reflect the real world of the userCreate usage scenarios reflecting how applications will use a new database
Merge static data models with dynamic process models to create resilient yet flexible database designs
Support application requirements by creating responsive database schemas in any database architecture
Cope with big data and unstructured data for transaction processing and decision support systems
Recognize when relational approaches won't work, and when to turn toward NoSQL solutions such as Cassandra or Hadoop
Who This Book Is For
System developers, including business analysts, database designers, database administrators, and application designers and developers who must design or interact with database systems
George Tillmann is a retired Booz, Allen Hamilton partner; a former programmer, analyst, management consultant; and CIO who managed Booz Allen's global IT organization. He brings more than 30 years experience as a database administrator, database consultant, and database product designer. He has written two books, was a Computerworld columnist, and has articles published in CIO, Infoworld, Techworld, Data Base, The Standard, Database Programming & Design and is a former member of the ANSI/X3/SPARC Data Base Management Systems Study Group.
Autoren/Hrsg.
Weitere Infos & Material
1;Contents at a Glance;5
2;Contents;7
3;About the Author;18
4;Preface;19
5;Part I: Introduction;23
5.1;Chapter 1: Introduction to Usage-Driven Database Design;24
5.1.1;Database Design Principle 1: Separation Principle;26
5.1.2;Database Design Principle 2: Distinction Principle;27
5.1.2.1;The Difference Between Separation and Distinction;29
5.1.3;Database Design Principle 3: Convergence Principle;29
5.1.3.1;The Separation, Distinction, and Convergence Principles;30
5.1.4;Database Design Principle 4: Minimal Regression Principle;30
5.1.5;Usage-Driven Database Design;30
5.1.5.1;Logical Data Modeling;31
5.1.5.2;Physical Schema Definition;32
5.1.5.3;The Terminology Trap;32
5.1.6;Notes;33
6;Part II: Logical Data Modeling;34
6.1;Chapter 2: The E-R Approach;35
6.1.1;A Little Data Modeling History;37
6.1.2;Some Important Definitions;38
6.1.3;Logical Data Modeling Objects;39
6.1.3.1;Entities;39
6.1.3.2;Type-Instance Distinction;40
6.1.3.3;Relationships;40
6.1.3.4;Attributes;41
6.1.4;Notes;42
6.2;Chapter 3: More About the E-R Approach;43
6.2.1;More About Relationships;43
6.2.1.1;Membership Class;43
6.2.1.1.1;Cardinality;44
6.2.1.1.2;Modality;45
6.2.1.2;Degree;47
6.2.1.2.1;Binary Relationship;47
6.2.1.2.2;N-ary Relationships;48
6.2.1.2.3;Unary or Recursive Relationships;48
6.2.1.3;Relationship Constraints;49
6.2.1.3.1;Inclusion;49
6.2.1.3.2;Exclusion;50
6.2.1.3.3;Conjunction;50
6.2.1.3.4;Simple Conjunction;50
6.2.1.3.5;Conditional Conjunction;51
6.2.1.3.6;Recursive Modality Constraints;52
6.2.2;More About Entities;55
6.2.2.1;Attributive Entity;55
6.2.2.2;Associative Entities;56
6.2.2.3;Supertype and Subtype Entities (Generalization and Specialization);56
6.2.3;More About Attributes;58
6.2.3.1;Attribute Domain;58
6.2.3.2;Attribute Source: Primitive and Derived;59
6.2.3.3;Attribute Descriptor and Unique Identifier;59
6.2.3.4;Compound or Concatenated Unique Identifiers;60
6.2.3.5;Attribute Complexity: Simple and Group;60
6.2.3.6;Attribute Valuation: Single Value and Multivalue;61
6.2.3.7;Attribute Complexity and Valuation;61
6.3;Chapter 4: Building the Logical Data Model;64
6.3.1;The Interview Process;65
6.3.1.1;Gather Information and Review;66
6.3.1.1.1;1. Identify the Users Who Are Authorities or Experts on the Subject;66
6.3.1.1.2;2. Meet and Interview the Experts and Identify the Subject (Application) Entities;66
6.3.1.1.2.1;Preparation;66
6.3.1.1.2.2;The First Interview;67
6.3.1.1.3;3. Identify Relationships Between the Entities;67
6.3.1.1.4;4. Identify the Properties or Attributes of the Entities and Relationships;67
6.3.1.2;Analyze Information;67
6.3.1.3;Construct Model;68
6.3.1.4;Repeat as Necessary;68
6.3.2;Making Sense of the Interview;68
6.3.2.1;Modeling Rules;70
6.3.3;Verifying What You Have Heard;72
6.3.3.1;Immediate Interview Feedback;72
6.3.3.2;Formal Walk-Throughs;72
6.3.4;Increasing E-R Diagram Comprehension;74
6.3.4.1;Subject Areas;74
6.3.4.2;Entity Fragments;75
6.3.4.3;Neighborhood Diagrams;76
6.3.4.4;Relationship Bridges and Stubs;77
6.3.5;Some Model Building Best Practices;78
6.3.5.1;Getting Started;78
6.3.5.2;Don’t Lose Control of the Project to Users;79
6.3.5.3;Don’t Lose Control of the Project to Technical Staff;79
6.3.5.4;Don’t Become Dependent on Tools or Techniques;80
6.3.5.5;Don’t Get Bogged Down in Endless Analysis;80
6.3.5.6;The Players…and the Rules of Engagement;81
6.3.6;Deliverables;82
6.3.7;Examples of Deliverables;83
6.3.7.1;Sample Data Dictionary, Data Object Definitions;84
6.3.8;Notes;86
6.4;Chapter 5: LDM Best Practices;87
6.4.1;Abbreviations;88
6.4.2;Almost Unique Identifiers;89
6.4.3;Clarity;90
6.4.4;Compound Unique Identifiers;90
6.4.5;Conceptual Integrity;91
6.4.6;Conjunctive Relationships;93
6.4.7;Duplicate Super-Subtypes “Type” Data;93
6.4.7.1;Exclusive and Nonexclusive Generalization;94
6.4.7.2;Required and Nonrequired Participation;96
6.4.8;Exclusive Relationships;96
6.4.9;Group Attributes;97
6.4.10;Level of Abstraction;97
6.4.11;Many-to-Many Relationships;98
6.4.12;N-ary Relationships;100
6.4.12.1;N-ary Relationships and Membership Class;101
6.4.13;Naming Objects;104
6.4.13.1;Multiple Names;104
6.4.13.2;Naming Conventions;105
6.4.13.3;Name Uniqueness;105
6.4.13.4;Naming Convention Goals;106
6.4.14;Null Attributes;109
6.4.14.1;There Be Blanks in Them Thar Nulls;109
6.4.15;Optional Relationships (Optional-Optional Relationships);110
6.4.16;Subject Areas;110
6.4.17;Supertypes and Subtypes;111
6.4.18;Unique Identifiers;113
6.4.19;Note;115
6.5;Chapter 6: LDM Pitfalls;116
6.5.1;Circular Relationships;116
6.5.2;Data Values;117
6.5.3;Data Value–Differentiated Entities and Attributes;118
6.5.4;Derived Data;119
6.5.4.1;Three Poor Arguments Against Modeling Derived Data;119
6.5.4.2;Derived Data as Process;120
6.5.4.3;Derived Data and Physical Database Design;121
6.5.5;Discrete Attributes;122
6.5.6;Embedded Attributes;123
6.5.6.1;Uniqueness;123
6.5.6.2;Group Attributes;123
6.5.6.3;The Problem with Embedded Attributes;123
6.5.6.4;The Solution;124
6.5.6.5;The Moral of the Story;124
6.5.7;Entity Fragmentation;124
6.5.8;Foreign Keys;125
6.5.9;Junction Entities;126
6.5.10;Normalization;126
6.5.11;Presentation Data;127
6.5.12;Primary Keys;127
6.5.13;Process Data;128
6.5.14;Repeating Groups;129
6.5.14.1;Multivalue Attribute;129
6.5.14.2;Group Attribute;130
6.5.15;Single-Attribute Entities;130
6.5.15.1;Code;131
6.5.15.2;Multivalue Attribute (Repeating Group);131
6.5.15.3;Associative Entities;131
6.5.16;Substitution Data;132
6.5.17;Substitution Tables;132
6.5.18;Transient Data;132
6.5.18.1;Location-Dependent Data;133
6.6;Chapter 7: LDM Perils to Watch For;135
6.6.1;Associatives Related to Other Associatives;135
6.6.2;Diagrammable Objects;136
6.6.3;Disassociated Entity Clusters (“Islands”);137
6.6.4;Duplicate Unique Identifiers;138
6.6.4.1;One Entity, Two or More Identifiers;138
6.6.4.2;One Identifier, Two or More Entities;138
6.6.5;Multiple Relationships;139
6.6.5.1;One Relationship, Multiple Views;139
6.6.5.2;Multiple Different but Similar Relationships;139
6.6.6;One-of-a-Kind (OOAK) Entities;140
6.6.7;One-to-One Relationships;140
6.6.8;Rare Entity Relationships;141
6.6.9;Recursive Modality Constraints;142
6.6.9.1;Updating the Constraints;143
6.6.10;Spiderwebs;144
6.6.11;Too Many Blanks or Nulls;145
6.6.12;Too Many Recursives;146
6.6.13;Next U3D Phase: Physical Schema Definition;147
6.6.14;Notes;147
7;Part III: Physical Schema Definition;148
7.1;Chapter 8: Introduction to Physical Database Design;149
7.1.1;A Short Incondite History of Automated Information Management (or, a Sequential Look at Random Access);150
7.1.1.1;Information Management Era 1: Sequential Processing;150
7.1.1.2;Information Management Era 2: The First Random Access DBMS;151
7.1.1.2.1;A Small Digression: A Couple of Words About Database Access;156
7.1.1.2.2;Hashing;156
7.1.1.2.3;Inverted Indices;158
7.1.1.2.4;Database Pages;159
7.1.1.2.5;B-Trees;160
7.1.1.2.6;Bitmaps;162
7.1.1.2.7;Associative Arrays;163
7.1.1.3;Information Management Era 3: Inverted File Systems;164
7.1.1.4;Information Management Era 4: The Age of Relational;164
7.1.1.4.1;Problems with Relational;166
7.1.1.4.1.1;First—Performance Issues;166
7.1.1.4.1.2;Second—Not So Simple Simplicity;166
7.1.1.4.1.2.1;Data Types;166
7.1.1.4.1.2.2;Procedural Code;166
7.1.1.4.1.2.3;Groups;167
7.1.1.4.1.3;Third—Communication and Language;167
7.1.1.4.1.4;Fourth—Relational: Theory or DBMS?;169
7.1.1.4.1.5;Fifth—Where Are You Relational Model?;170
7.1.1.4.2;Just Because It Has Failings Doesn’t Mean It’s a Failure;170
7.1.1.5;Information Management Era 5: Object Technology;171
7.1.1.5.1;Object-Oriented Programming Led to Object-Oriented Analysis and Design, Which Eventually Led to the Object-Oriented Database Management Systems (OODBMSs);171
7.1.1.5.2;A Small Digression (Again): The ACID Test;172
7.1.1.6;Information Management Era 6: NoSQL;173
7.1.1.6.1;Key-Value;173
7.1.1.6.2;Graph;174
7.1.1.6.3;Document Management;174
7.1.1.6.4;Multimodal;174
7.1.1.6.5;Is That an ACID or a BASE?;174
7.1.2;And the Winner Is…;176
7.1.3;What’s to Come;177
7.1.4;References;177
7.1.5;Notes;177
7.2;Chapter 9: Introduction to Physical Schema Definition;179
7.2.1;Usage-Driven Database Design: Physical Schema Definition;181
7.2.2;Step 1: Transformation;182
7.2.2.1;Task 1.1: Translation;183
7.2.2.2;Task 1.2: Expansion;184
7.2.3;Step 2: Utilization;185
7.2.3.1;Task 2.1: Usage Analysis;185
7.2.3.2;Task 2.2: Path Rationalization;187
7.2.4;Step 3: Formalization;188
7.2.4.1;Task 3.1: Environment Designation;189
7.2.4.2;Task 3.2: Constraint Compliance;191
7.2.5;Step 4: Customization;193
7.2.5.1;Task 4.1: Resource Analysis;194
7.2.5.2;Task 4.2: Performance Enhancement;196
7.2.6;Summary;197
7.3;Chapter 10: Transformation: Creating the Physical Data Model;199
7.3.1;Task 1.1: Translation;200
7.3.1.1;Activity 1.1.1: Transform LDM Objects to PDM Objects;200
7.3.1.1.1;Entities to Record Types;201
7.3.1.1.2;Relationships to Linkages;204
7.3.1.1.2.1;Linkage Membership Class;204
7.3.1.1.2.2;Linkage Degree;205
7.3.1.1.2.3;Linkage Constraints;205
7.3.1.1.3;Attributes to Data Items;206
7.3.1.1.3.1;Data Item Domain;207
7.3.1.1.3.2;Data Item Source: Primitive and Derived;207
7.3.1.1.3.2.1;Primitive Data Item: Unique Identifiers and Descriptors;207
7.3.1.1.3.3;Data Item Complexity: Simple and Group;207
7.3.1.1.3.4;Data Item Valuation: Single Value and Multivalue;208
7.3.1.1.3.5;Other Data Item Information;208
7.3.1.2;Activity 1.1.2: Diagram the Objects;208
7.3.2;Task 1.2: Expansion;209
7.3.2.1;Activity 1.2.1: Assign Keys;209
7.3.2.2;Activity 1.2.2: Normalize the Model;211
7.3.2.2.1;Adjustments Needed for Normalization: Keys—Foreign and Domestic?;213
7.3.2.2.2;Zero Normal Form;214
7.3.2.2.3;First Normal Form;214
7.3.2.2.4;Before Getting to Second Normal Form, a Slight Digression;215
7.3.2.2.5;Second Normal Form;215
7.3.2.2.6;Third Normal Form;216
7.3.2.2.7;Post-Normalization—Retreat of Sally Forth?;217
7.3.2.2.8;Issues with Normalization;217
7.3.3;Tranformation Notes;219
7.3.4;Deliverables;220
7.3.5;Examples of Deliverables;220
7.4;Chapter 11: Utilization: Merging Data and Process;224
7.4.1;Task 2.1: Usage Analysis;225
7.4.1.1;Process Modeling;225
7.4.1.2;Logical Process Modeling;225
7.4.1.2.1;Natural-Language Logical Process Modeling Techniques;226
7.4.1.2.1.1;Plain English;226
7.4.1.2.1.2;Oy Vey, There Has Got to Be a Better English Translation;226
7.4.1.2.1.3;Structured English;226
7.4.1.2.2;Graphical Logical Process Modeling Techniques;228
7.4.1.3;Physical Process Modeling;230
7.4.1.3.1;Natural-Language Physical Process Modeling Techniques;230
7.4.1.3.1.1;Plain English;230
7.4.1.3.1.2;Structured English;230
7.4.1.3.1.3;Pseudocode;230
7.4.1.3.2;Graphic Physical Process Modeling Techniques;231
7.4.1.3.2.1;Flow Charts;231
7.4.1.3.2.2;Structure Charts;232
7.4.1.4;Activity 2.1.1: Create Usage Scenarios;233
7.4.1.4.1;Clearing the Decks for Action;234
7.4.1.5;Putting a Usage Scenario Together;237
7.4.1.6;An Example;237
7.4.1.7;Activity 2.1.2: Map Usage Scenarios to the PDM;239
7.4.2;Task 2.2: Path Rationalization;240
7.4.2.1;Activity 2.2.1: Reduce to Simplest Paths;240
7.4.2.2;Activity 2.2.2: Simplify Model;241
7.4.3;Utilization Notes;242
7.4.4;Deliverables;242
7.4.5;Example of Deliverables;243
7.4.6;Further Reading;243
7.4.6.1;Structured English;243
7.4.6.2;Data Flow Diagramming;244
7.4.6.3;Flow Charts;244
7.4.6.4;Pseudocode;244
7.4.6.5;Structure Charts;244
7.5;Chapter 12: Formalization: Creating a Schema;245
7.5.1;Task 3.1: Environment Designation;246
7.5.1.1;Hierarchical Systems;249
7.5.1.2;Network Systems;249
7.5.1.3;Relational Systems;249
7.5.1.4;Object-Oriented;250
7.5.1.5;NoSQL;250
7.5.1.6;DBMS Product and Version Selection;250
7.5.2;Task 3.2: Constraint Compliance;251
7.5.2.1;Pseudocode…Again;251
7.5.2.2;Activity 3.2.1: Map Rationalized Physical Data Model to the Data Architecture;253
7.5.2.2.1;Record Types;253
7.5.2.2.1.1;Proper;253
7.5.2.2.1.2;Associative;253
7.5.2.2.1.3;Attributive;254
7.5.2.2.1.4;S-Type;255
7.5.2.2.2;Links;255
7.5.2.2.2.1;Membership Class: Cardinality;255
7.5.2.2.2.1.1;One-to-One;255
7.5.2.2.2.1.2;One-to-Many;256
7.5.2.2.2.1.3;Many-to-Many;256
7.5.2.2.2.2;Membership Class: Modality;256
7.5.2.2.2.2.1;Mandatory;257
7.5.2.2.2.2.2;Optional;257
7.5.2.2.2.3;Degree;257
7.5.2.2.2.3.1;Unary;257
7.5.2.2.2.3.2;Binary;258
7.5.2.2.2.3.3;N-ary;258
7.5.2.2.2.4;Constraints;259
7.5.2.2.2.4.1;Inclusion;259
7.5.2.2.2.4.2;Exclusion;259
7.5.2.2.2.4.3;Conjunction;259
7.5.2.3;Data Items;260
7.5.2.3.1;Domains;260
7.5.2.3.1.1;Source: Primitive and Derived;260
7.5.2.3.1.1.1;Primitive Data Items;260
7.5.2.3.1.1.2;Derived Data Items;261
7.5.2.3.1.2;Complexity: Simple and Group;261
7.5.2.3.1.3;Valuation: Single Value and Multivalue;262
7.5.2.4;Activity 3.2.2: Create a DBMS Product/Version-Specific Functional Physical Database Design;263
7.5.2.4.1;Subschema Creation;265
7.5.3;Formalization Notes;267
7.5.4;Deliverables;267
7.5.5;Example of Deliverables;268
7.6;Chapter 13: Customization: Enhancing Performance;269
7.6.1;Task 4.1: Resource Analysis;270
7.6.1.1;The Trade-Off Triangle;271
7.6.2;Task 4.2: Performance Enhancements;274
7.6.2.1;Activity 4.2.1: Customize Hardware;275
7.6.2.1.1;A Few Words About Secondary Storage;275
7.6.2.1.1.1;Add Disk;277
7.6.2.1.1.2;Faster Disk;277
7.6.2.1.1.3;Main Memory;277
7.6.2.2;Activity 4.2.2: Customize Software;278
7.6.2.2.1;Indices (B-Tree, Hash, Bitmap);278
7.6.2.2.2;Clustering;278
7.6.2.2.3;Example Using Indices and Clusters;281
7.6.2.2.3.1;Question 1: Should Order Be Indexed?;283
7.6.2.2.3.2;Question 2: Should Product Be Indexed?;283
7.6.2.2.3.3;Question 3: Should Line Item Be Indexed?;284
7.6.2.2.3.4;Question 4: Should Line Item Be Clustered?;284
7.6.2.2.3.5;Question 5: Should Line Item Be Clustered Around Order or Product?;284
7.6.2.2.3.6;Alternative 1: Line Item Clustered Around Order;284
7.6.2.2.3.7;Alternative Two: Line Item Clustered Around Product;285
7.6.2.2.4;Partitioning;286
7.6.2.2.5;Derived and Duplicate Data;287
7.6.2.2.6;Denormalization;288
7.6.2.2.7;Get Rid of ACID;288
7.6.2.2.8;Big Data, Big Problems, Big Solution;289
7.6.2.2.9;To Plunge or Not to Plunge;290
7.6.2.2.10;NoSQL;291
7.6.2.2.11;Modeling Big Data U3D Style;292
7.6.3;Customization Notes;294
7.6.4;Deliverables;294
7.6.5;Examples of Deliverables;295
7.7;Chapter 14: The Data Warehouse;298
7.7.1;The Data Warehouse;299
7.7.1.1;Data Warehouse Architecture;301
7.7.2;Using U3D to Develop a Data Warehouse;302
7.7.2.1;Step 1: Transformation;302
7.7.2.2;Step 2: Utilization;303
7.7.2.2.1;The Time Dimension;305
7.7.2.3;Step 3: Formalization;306
7.7.2.4;Step 4: Customization;306
7.7.2.4.1;Streamlining;306
7.7.2.4.2;Duplication;306
7.7.2.4.3;Denormalization;306
7.7.2.4.4;Indices and Hashing;307
7.7.2.4.5;Bitmaps;307
7.7.2.4.6;Bulk Loading;308
7.7.2.4.7;Clustering;309
7.7.2.4.8;Partitioning;309
7.7.2.4.9;Distributed Processing;309
7.7.2.4.10;All Together Now…;310
7.7.2.4.11;Oops…;310
7.7.3;Customization Notes;311
7.7.4;Note;311
7.8;Chapter 15: The Big Data Decision Support System;312
7.8.1;Structured, Unstructured, and Semistructured Data—Another Small Digression;313
7.8.2;DSS and Big Data;316
7.8.3;Using U3D to Develop a Big Data Decision Support System;317
7.8.3.1;Step 1: Transformation;318
7.8.3.2;Step 2: Utililization;318
7.8.3.3;Step 3: Formalization;319
7.8.3.4;Step 4: Customization;321
7.8.3.4.1;A Little About Hadoop;321
7.8.3.4.2;Putting It All Together;324
7.8.4;Deliverables;324
8;Part IV: Where from Here?;326
8.1;Chapter 16: A Look Ahead;327
8.1.1;We Need to Ask the Awkward Questions;328
8.1.2;Tools Need to Take Usage into Account;330
8.1.3;The One and Only DBMS;330
8.1.4;Better Training;334
8.1.5;Notes;336
9;Part V: Appendixes;337
9.1;Appendix A: Glossary;338
9.2;Appendix B: Logical Data Modeling Definitions;357
9.2.1;Entity;357
9.2.2;Relationship;358
9.2.3;Attribute;358
9.2.4;Domain;359
9.3;Appendix C: Physical Schema Definition Object Definitions;360
9.3.1;Record;360
9.3.2;Linkage;361
9.3.3;Data Item;362
9.3.4;Domain;362
9.3.5;Cluster;363
9.3.6;Partition;363
9.3.7;Index;364
9.4;Appendix D: Formulas Used in This Book;365
9.5;Appendix E: List of U3D Deliverables;367
10;Index;370




