E-Book, Englisch, Band 1, 185 Seiten, eBook
Reihe: Web Information Systems Engineering and Internet Technologies Book Series
Ling / Dobbie Semistructured Database Design
1. Auflage 2006
ISBN: 978-0-387-23568-4
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, Band 1, 185 Seiten, eBook
Reihe: Web Information Systems Engineering and Internet Technologies Book Series
ISBN: 978-0-387-23568-4
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Semistructured Database Design provides an essential reference for anyone interested in the effective management of semsistructured data. Since many new and advanced web applications consume a huge amount of such data, there is a growing need to properly design efficient databases.
This volume responds to that need by describing a semantically rich data model for semistructured data, called Object-Relationship-Attribute model for Semistructured data (ORA-SS). Focusing on this new model, the book discusses problems and presents solutions for a number of topics, including schema extraction, the design of non-redundant storage organizations for semistructured data, and physical semistructured database design, among others.
Semistructured Database Design presents researchers and professionals with the most complete and up-to-date research in this fast-growing field.
Zielgruppe
Research
Autoren/Hrsg.
Weitere Infos & Material
Data Models for Semistructured Data.- ORA-SS.- Schema Extraction.- Normalization.- Views.- Physical Database Design.- Conclusion.
Chapter 1 INTRODUCTION (p. 1-2)
Today, many computer systems produce and consume large amounts of data. Consider a library catalogue system that stores the details of the holdings in a library and allows users to query information and perhaps even request books, or an accounting system that reads data from files, transforms it and prints reports. In the past much of the data has been stored in relational database systems and the designers of the computer systems have paid special attention to the organization or structure of this data. We have since moved to the age of the World Wide Web (or web) where many new technologies and applications have emerged.
Many of the applications built today are web based, and the corresponding technologies that are used have been specifically designed for the web. Let us consider how data was stored before the advent of the web. Data was stored in files or in databases. For the former, the entire file is read from and written to disk when data is needed. This works well for applications that do not use large amounts of data, that is, applications that can read the entire file into memory, manipulate the data and write the file back out to disk. However, this approach is inadequate for systems that require more data than can fit in main memory. For these kinds of applications, a database is required.
The use of databases leads to new problems including how to maintain the consistency of the data with respect to real world constraints. For example, suppose we have a database that stores details of students. Is it possible to ensure that a student’s address appears in the database only once. If the address appears multiple times, then how can we guarantee the consistency of the repeated data? It is necessary to model the constraints in the database if we want the database system to enforce these constraints. Some constraints can be enforced by the organization or structure of the data while others must be programmed as general constraints.
Yet another problem that arises from the use of database systems is how should the constraints from the real world be captured during the design process. Typically they are recorded in a conceptual model such as an Entity- Relationship diagram. Such constraints contain semantic information, that is, they provide some meaning to the underlying data. It is important that these constraints are enforced by the database. When data is manipulated, the database system checks that none of the constraints are violated. In other words, the semantics from the real world still hold in the result of the manipulation.
Traditional relational databases which assume that data is structured are no longer suitable for the new Web applications because the data on which the Web applications are based lacks structure and may be incomplete. Thus, many of the techniques that were previously used may not be applicable. This less structured data, also known as semistructured data, is usually represented as a tree of elements, where the children are sub-elements of their parent element. Elements can in turn have attributes. Queries over the trees are represented as path expressions.
The eXtensible Markup Language (XML) [Bray et al., 2000] is a language that is used to express semistructured data. XML is self-describing since each element has a tag which gives a name for the content. However, recently, various schema languages have been defined to specify the structure of the underlying XML data and constraints that are expected to hold in instances of the XML data. The schemas are descriptive rather than prescriptive. Like traditional data, XML data may be stored in files or in a database. The database can have an underlying relational engine or it can be specifically designed for XML data. The former are called XML-enabled databases and the latter are called native XML databases. Like the entity relationship diagram for relational databases, a diagrammatic representation that reflects real world constraints could be used for requirements gathering, and for the design of schemas for semistructured documents.