E-Book, Englisch, 326 Seiten
Gladney Preserving Digital Information
1. Auflage 2007
ISBN: 978-3-540-37887-7
Verlag: Springer Berlin Heidelberg
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 326 Seiten
ISBN: 978-3-540-37887-7
Verlag: Springer Berlin Heidelberg
Format: PDF
Kopierschutz: 1 - PDF Watermark
Cultural history enthusiasts have asserted the urgent need to protect digital information from imminent loss. This book describes methodology for long-term preservation of all kinds of digital documents. It justifies this methodology using 20th century theory of knowledge communication, and outlines the requirements and architecture for the software needed. The author emphasizes attention to the perspectives and the needs of end users.
Henry M. Gladney is an industry consultant for digital preservation and document management. In 2001, he founded his own company, HMG Consulting, based in Saratoga, CA, after having worked for IBM Research for decades, designing - among other systems - a digital library service that is the core of today's IBM Content Manager®. He is a regular author in the top ACM periodicals, holds eleven patents, and produces the 'Digital Document Quarterly', an online newsletter that has discussed preservation extensively.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;8
2;Summary Table of Contents;16
3;Detailed Table of Contents;18
4;Figures;23
5;Tables;24
6;Part I: Why We Need Long-term Digital Preservation;25
6.1;1 State of the Art;31
6.1.1;1.1 What is Digital Information Preservation?;32
6.1.2;1.2 What Would a Preservation Solution Provide?;35
6.1.3;1.3 Why Do Digital Data Seem to Present Difficulties?;36
6.1.4;1.4 Characteristics of Preservation Solutions;38
6.1.5;1.5 Technical Objectives and Scope Limitations;43
6.1.6;1.6 Summary;45
6.2;2 Economic Trends and Social Issues;47
6.2.1;2.1 The Information Revolution;47
6.2.2;2.2 Economic and Technical Trends;49
6.2.3;2.3 Democratization of Information;54
6.2.4;2.4 Social Issues;55
6.2.5;2.5 Documents as Social Instruments;57
6.2.6;2.6 Why So Slow Toward Practical Preservation?;67
6.2.7;2.7 Selection Criteria: What is Worth Saving?;69
6.2.8;2.8 Summary;74
7;Part II: Information Object Structure;77
7.1;3 Introduction to Knowledge Theory;81
7.1.1;3.1 Conceptual Objects: Values and Patterns;82
7.1.2;3.2 Ostensive Definition and Names;84
7.1.3;3.3 Objective and Subjective: Not a Technological Issue;87
7.1.4;3.4 Facts and Values: How Can We Distinguish?;89
7.1.5;3.5 Representation Theory: Signs and Sentence Meanings;92
7.1.6;3.6 Documents and Libraries: Collections, Sets, and Classes;94
7.1.7;3.7 Syntax, Semantics, and Rules;96
7.1.8;3.8 Summary;98
7.2;4 Lessons from Scientific Philosophy;101
7.2.1;4.1 Intentional and Accidental Information;101
7.2.2;4.2 Distinctions Sought and Avoided;103
7.2.3;4.3 Information and Knowledge: Tacit and Human Aspects;106
7.2.4;4.4 Trusted and Trustworthy;109
7.2.5;4.5 Relationships and Ontologies;110
7.2.6;4.6 What Copyright Protection Teaches;112
7.2.7;4.7 Summary;114
7.3;5 Trust and Authenticity;117
7.3.1;5.1 What Can We Trust?;118
7.3.2;5.2 What Do We Mean by ‘Authentic’?;119
7.3.3;5.3 Authenticity for Different Information Genres;122
7.3.4;5.4 How Can We Preserve Dynamic Resources?;127
7.3.5;5.5 Summary;129
7.4;6 Describing Information Structure;133
7.4.1;6.1 Testable Archived Information;134
7.4.2;6.2 Syntax Specification with Formal Languages;135
7.4.3;6.3 Monographs and Collections;139
7.4.4;6.4 Digital Object Schema;141
7.4.5;6.5 From Ontology to Architecture and Design;148
7.4.6;6.6 Metadata;153
7.4.7;6.7 Summary;157
8;Part III: Distributed Content Management;159
8.1;7 Digital Object Formats;163
8.1.1;7.1 Character Sets and Fonts;163
8.1.2;7.2 File Formats;166
8.1.3;7.3 Perpetually Unique Resource Identifiers;176
8.1.4;7.4 Summary;184
8.2;8 Archiving Practices;187
8.2.1;8.1 Security;187
8.2.2;8.2 Recordkeeping Standards;197
8.2.3;8.3 Archival Best Practices;199
8.2.4;8.4 Repository Audit and Certification;200
8.2.5;8.5 Summary;202
8.3;9 Everyday Digital Content Management;205
8.3.1;9.1 Software Layering;207
8.3.2;9.2 A Model of Storage Stack Development;209
8.3.3;9.3 Repository Architecture;210
8.3.4;9.4 Archival Collection Types;220
8.3.5;9.5 Summary;226
9;Part IV: Digital Object Architecture for the Long Term;229
9.1;10 Durable Bit-Strings and Catalogs;233
9.1.1;10.1 Media Longevity;234
9.1.2;10.2 Replication to Protect Bit-Strings;237
9.1.3;10.3 Repository Catalog f Collection Consistency;238
9.1.4;10.4 Collection Ingestion and Sharing;239
9.1.5;10.5 Summary;241
9.2;11 Durable Evidence;243
9.2.1;11.1 Structure of Each Trustworthy Digital Object;244
9.2.2;11.2 Infrastructure for Trustworthy Digital Objects;251
9.2.3;11.3 Other Ways to Make Documents Trustworthy;256
9.2.4;11.4 Summary;257
9.3;12 Durable Representation;259
9.3.1;12.1 Representation Alternatives;260
9.3.2;12.2 Design of a Durable Encoding Environment;266
9.3.3;12.3 Summary;272
10;Part V: Peroration;275
10.1;13 Assessment and the Future;275
10.1.1;13.1 Preservation Based on Trustworthy Digital Objects;276
10.1.2;13.2 Open Challenges of Metadata Creation;280
10.1.3;13.3 Applied Knowledge Theory;283
10.1.4;13.4 Assessment of the TDO Methodology;285
10.1.5;13.5 Summary and Conclusion;287
11;Appendices;289
11.1;Appendix A: Acronyms and Glossary;289
11.2;Appendix B: Uniform Resource Identifier Syntax;304
11.3;Appendix C: Repository Requirements;306
11.4;Appendix D: Assessment with Independent Criteria;308
11.5;Appendix E: Universal Virtual Computer Specification;313
11.5.1;E.1 Memory Model;313
11.5.2;E.2 Machine Status Registers;314
11.5.3;E.3 Machine Instruction Codes;315
11.5.4;E.4 Organization of an Archived Module;320
11.5.5;E:5 Application Example;321
11.6;Appendix F: Software Modules Wanted;324
12;Bibliography;327
12 Durable Representation (p. 235-236)
We want unambiguous communication with future generations with whom dialog is impossible, without restricting what today’s authors can communicate. For this, we need language that we can confidently expect our descendants to understand easily. This challenge is the kind of language problem that has been central to computer science since it emerged as a discipline in the 1960s. Its core can be restated as, "ensure that an arbitrary computer program will execute correctly on a machine whose architecture is unknown when the program is saved."
The English logician A. M. Turing showed in 1937 (and various computing machine experts have put this into practice since then in various particular ways) that it is possible to develop code instruction systems for a computing machine which cause it to behave as if it were another, specified, computing machine. …
A code, which according to Turing's schema is supposed to make one machine behave as if it were another specific machine … must do the following things. It must contain, in terms that the machine will understand and (purposively obey), instructions … that will cause the machine to examine every order it gets and determine whether this order has the structure appropriate to an order of the second machine. It must then contain, in terms of the order system of the first machine, sufficient orders to make the machine cause the actions to be taken that the second machine would have taken under the influence of the order in question.
The important result of Turing's is that in this way the first machine can be caused to imitate the behavior of any other machine. von Neumann 1956, The Computer and the Brain, pp.70–71
Durable encoding, described in this chapter, represents difficult content types with the aid of programs written in virtual machine code - the code of a machine we call a UVC (Universal Virtual Computer). This Turing- Machine-equivalent virtual machine is simple compared to the designs of practical hardware. Its design can be specified completely, concisely, and unambiguously for future interpretation.
Objects to be preserved might consist of several source files, each represented as a bit-stream in a Fig. 32 digital object collection, with labeled links between parts of the complete package. Much of each TDO will be encoded using XML, relations, encryption algorithms, and identifiers. These are governed by relatively simple standards that are widely used - standards that we can be reasonably confident will be completely and correctly understood many years into the future. As described in §11.1, metadata can, and should, record the representation of each TDO component. The means for making each Fig. 32 content blob interpretable forever remains to be provided. What follows describes how this can be accomplished for a single content blob.
12.1 Representation Alternatives
We want information representation methods that can be embodied in tools whose use would be practical for information producers and consumers who do not have specialized skills or equipment.




