E-Book, Englisch, Band 1, 185 Seiten
Reihe: Web Information Systems Engineering and Internet Technologies Book Series
Ling / Dobbie Semistructured Database Design
1. Auflage 2006
ISBN: 978-0-387-23568-4
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, Band 1, 185 Seiten
Reihe: Web Information Systems Engineering and Internet Technologies Book Series
ISBN: 978-0-387-23568-4
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Semistructured Database Design provides an essential reference for anyone interested in the effective management of semsistructured data. Since many new and advanced web applications consume a huge amount of such data, there is a growing need to properly design efficient databases. This volume responds to that need by describing a semantically rich data model for semistructured data, called Object-Relationship-Attribute model for Semistructured data (ORA-SS). Focusing on this new model, the book discuss problems and present solutions for a number of topics, including schema extraction, the design of non-redundant storage organizations for semistructured data, and physical semsitructured database design, among others. Semistructured Database Design, presents researchers and professionals with the most complete and up-to-date research in this fast-growing field.
Autoren/Hrsg.
Weitere Infos & Material
1;Contents;6
2;List of Figures;10
3;List of Tables;14
4;Preface;15
5;1 INTRODUCTION;17
5.1;1.1 Chapter Overview;19
6;2 DATA MODELS FOR SEMISTRUCTURED DATA;23
6.1;2.1 Document Type Definition;24
6.2;2.2 DOM, OEM and DataGuide;28
6.3;2.3 S3-graph;32
6.4;2.4 CM Hypergraph and Scheme Tree;34
6.5;2.5 EER and XGrammar;37
6.6;2.6 AL-DTD and XML Tree;40
6.7;2.7 ORA-SS;44
6.8;2.8 Discussion;48
7;3 ORA-SS;53
7.1;3.1 ORA-SS Schema Diagram;53
7.2;3.2 ORA-SS Data Instance Diagram;65
7.3;3.3 ORA-SS Functional Dependency Diagram;68
7.4;3.4 ORA-SS Inheritance Hierarchy Diagram;71
7.5;3.5 Discussion;73
8;4 SCHEMA EXTRACTION;75
8.1;4.1 Basic Extraction Rules;76
8.2;4.2 Schema Extraction Algorithm;78
8.3;4.3 Example;82
8.4;4.4 Discussion;90
8.5;4.5 Summary;91
9;5 NORMALIZATION;93
9.1;5.1 Motivating Example;94
9.2;5.2 Background;98
9.3;5.3 A Normal Form For Semistructured Schemas;101
9.4;5.4 Converting Schemas into the Normal Form;105
9.5;5.5 Discussion;123
10;6 VIEWS;127
10.1;6.1 Motivating Example;128
10.2;6.2 The Select Operator;132
10.3;6.3 The Drop Operator;133
10.4;6.4 The Join Operator;137
10.5;6.5 The Swap Operator;141
10.6;6.6 Design Rules for IDentifier Dependency Relationship;148
10.7;6.7 Example of Designing View;150
10.8;6.8 Related Work;152
10.9;6.9 Summary;154
11;7 PHYSICAL DATABASE DESIGN;155
11.1;7.1 Relational Database Physical Design;155
11.2;7.2 IMS Database Physical Design;157
11.3;7.3 Redundancy in ORA-SS Schema Diagram;159
11.4;7.4 Replicated NF in ORA-SS;162
11.5;7.5 Controlled Pairing in ORA-SS Schema Diagrams;166
11.6;7.6 Measure of Data Replication;169
11.7;7.7 Guidelines for Physical Semistructured Database Design;170
11.8;7.8 Storage of Documents in an Object Relational Database;174
11.9;7.9 Summary;176
12;8 CONCLUSION;177
13;Appendix;181
14;References;185
15;Index;189
16;About the Authors;191
Chapter 1 INTRODUCTION (p. 1-2)
Today, many computer systems produce and consume large amounts of data. Consider a library catalogue system that stores the details of the holdings in a library and allows users to query information and perhaps even request books, or an accounting system that reads data from files, transforms it and prints reports. In the past much of the data has been stored in relational database systems and the designers of the computer systems have paid special attention to the organization or structure of this data. We have since moved to the age of the World Wide Web (or web) where many new technologies and applications have emerged.
Many of the applications built today are web based, and the corresponding technologies that are used have been specifically designed for the web. Let us consider how data was stored before the advent of the web. Data was stored in files or in databases. For the former, the entire file is read from and written to disk when data is needed. This works well for applications that do not use large amounts of data, that is, applications that can read the entire file into memory, manipulate the data and write the file back out to disk. However, this approach is inadequate for systems that require more data than can fit in main memory. For these kinds of applications, a database is required.
The use of databases leads to new problems including how to maintain the consistency of the data with respect to real world constraints. For example, suppose we have a database that stores details of students. Is it possible to ensure that a student’s address appears in the database only once. If the address appears multiple times, then how can we guarantee the consistency of the repeated data? It is necessary to model the constraints in the database if we want the database system to enforce these constraints. Some constraints can be enforced by the organization or structure of the data while others must be programmed as general constraints.
Yet another problem that arises from the use of database systems is how should the constraints from the real world be captured during the design process. Typically they are recorded in a conceptual model such as an Entity- Relationship diagram. Such constraints contain semantic information, that is, they provide some meaning to the underlying data. It is important that these constraints are enforced by the database. When data is manipulated, the database system checks that none of the constraints are violated. In other words, the semantics from the real world still hold in the result of the manipulation.
Traditional relational databases which assume that data is structured are no longer suitable for the new Web applications because the data on which the Web applications are based lacks structure and may be incomplete. Thus, many of the techniques that were previously used may not be applicable. This less structured data, also known as semistructured data, is usually represented as a tree of elements, where the children are sub-elements of their parent element. Elements can in turn have attributes. Queries over the trees are represented as path expressions.
The eXtensible Markup Language (XML) [Bray et al., 2000] is a language that is used to express semistructured data. XML is self-describing since each element has a tag which gives a name for the content. However, recently, various schema languages have been defined to specify the structure of the underlying XML data and constraints that are expected to hold in instances of the XML data. The schemas are descriptive rather than prescriptive. Like traditional data, XML data may be stored in files or in a database. The database can have an underlying relational engine or it can be specifically designed for XML data. The former are called XML-enabled databases and the latter are called native XML databases. Like the entity relationship diagram for relational databases, a diagrammatic representation that reflects real world constraints could be used for requirements gathering, and for the design of schemas for semistructured documents.




