Ferilli | Automatic Digital Document Processing and Management | E-Book | www.sack.de
E-Book

E-Book, Englisch, 297 Seiten

Reihe: Advances in Computer Vision and Pattern Recognition

Ferilli Automatic Digital Document Processing and Management

Problems, Algorithms and Techniques
1. Auflage 2011
ISBN: 978-0-85729-198-1
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark

Problems, Algorithms and Techniques

E-Book, Englisch, 297 Seiten

Reihe: Advances in Computer Vision and Pattern Recognition

ISBN: 978-0-85729-198-1
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark



This text reviews the issues involved in handling and processing digital documents. Examining the full range of a document's lifetime, the book covers acquisition, representation, security, pre-processing, layout analysis, understanding, analysis of single components, information extraction, filing, indexing and retrieval. Features: provides a list of acronyms and a glossary of technical terms; contains appendices covering key concepts in machine learning, and providing a case study on building an intelligent system for digital document and library management; discusses issues of security, and legal aspects of digital documents; examines core issues of document image analysis, and image processing techniques of particular relevance to digitized documents; reviews the resources available for natural language processing, in addition to techniques of linguistic analysis for content handling; investigates methods for extracting and retrieving data/information from a document.

Ferilli Automatic Digital Document Processing and Management jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


1;Foreword;6
2;Preface;9
3;Acknowledgments;12
4;Contents;13
5;Acronyms;19
6;Digital Documents;23
6.1;Documents;25
6.1.1;A Juridic Perspective;25
6.1.2;History and Trends;26
6.1.3;Current Landscape;27
6.1.4;Types of Documents;29
6.1.5;Document-Based Environments;32
6.1.6;Document Processing Needs;33
6.1.7;References;34
6.2;Digital Formats;36
6.2.1;Compression Techniques;37
6.2.1.1;RLE (Run Length Encoding);37
6.2.1.2;Huffman Encoding;37
6.2.1.3;LZ77 and LZ78 (Lempel-Ziv);39
6.2.1.4;LZW (Lempel-Ziv-Welch);40
6.2.1.5;DEFLATE;42
6.2.2;Non-structured Formats;42
6.2.2.1;Plain Text;43
6.2.2.1.1;ASCII;44
6.2.2.1.2;ISO Latin;44
6.2.2.1.3;UNICODE;45
6.2.2.1.4;UTF;45
6.2.2.2;Images;49
6.2.2.2.1;Color Spaces;49
6.2.2.2.1.1;RGB;50
6.2.2.2.1.2;YUV/YCbCr;50
6.2.2.2.1.3;CMY(K);51
6.2.2.2.1.4;HSV/HSB and HLS;51
6.2.2.2.1.5;Comparison among Color Spaces;51
6.2.2.2.2;Raster Graphics;52
6.2.2.2.2.1;BMP (BitMaP);53
6.2.2.2.2.2;GIF (Graphics Interchange Format);55
6.2.2.2.2.3;TIFF (Tagged Image File Format);57
6.2.2.2.2.4;JPEG (Joint Photographic Experts Group);58
6.2.2.2.2.5;PNG (Portable Network Graphics);60
6.2.2.2.2.6;DjVu (DejaVu);62
6.2.2.2.3;Vector Graphic;64
6.2.2.2.3.1;SVG (Scalable Vector Graphic);64
6.2.3;Layout-Based Formats;66
6.2.3.1;PS (PostScript);66
6.2.3.2;PDF (Portable Document Format);77
6.2.4;Content-Oriented Formats;80
6.2.4.1;Tag-Based Formats;81
6.2.4.1.1;HTML (HyperText Markup Language);82
6.2.4.1.2;XML (eXtensible Markup Language);87
6.2.4.2;Office Formats;90
6.2.4.2.1;ODF (OpenDocument Format);90
6.2.5;References;91
6.3;Legal and Security Aspects;93
6.3.1;Cryptography;94
6.3.1.1;Basics;94
6.3.1.2;Short History;96
6.3.1.3;Digital Cryptography;97
6.3.1.3.1;DES (Data Encryption Standard);99
6.3.1.3.2;IDEA (International Data Encryption Algorithm);100
6.3.1.3.3;Key Exchange Method;101
6.3.1.3.4;RSA (Rivest, Shamir, Adleman);102
6.3.1.3.5;DSA (Digital Signature Algorithm);105
6.3.2;Message Fingerprint;105
6.3.2.1;SHA (Secure Hash Algorithm);106
6.3.3;Digital Signature;108
6.3.3.1;Management;110
6.3.3.1.1;DSS (Digital Signature Standard);112
6.3.3.1.2;OpenPGP Standard;113
6.3.3.2;Trusting and Certificates;114
6.3.4;Legal Aspects;117
6.3.4.1;A Law Approach;118
6.3.4.2;Public Administration Initiatives;121
6.3.4.2.1;Digital Signature;121
6.3.4.2.2;Certified e-mail;123
6.3.4.2.3;Electronic Identity Card & National Services Card;124
6.3.4.2.4;Telematic Civil Proceedings;124
6.3.5;References;128
7;Document Analysis;130
7.1;Image Processing;132
7.1.1;Basics;133
7.1.1.1;Convolution and Correlation;133
7.1.2;Color Representation;135
7.1.2.1;Color Space Conversions;136
7.1.2.1.1;RGB-YUV;136
7.1.2.1.2;RGB-YCbCr;136
7.1.2.1.3;RGB-CMY(K);137
7.1.2.1.4;RGB-HSV;137
7.1.2.1.5;RGB-HLS;138
7.1.2.2;Colorimetric Color Spaces;139
7.1.2.2.1;XYZ;139
7.1.2.2.2;L*a*b*;140
7.1.3;Color Depth Reduction;141
7.1.3.1;Desaturation;141
7.1.3.2;Grayscale (Luminance);142
7.1.3.3;Black&White (Binarization);142
7.1.3.3.1;Otsu Thresholding;142
7.1.4;Content Processing;143
7.1.4.1;Geometrical Transformations;144
7.1.4.2;Edge Enhancement;145
7.1.4.2.1;Derivative Filters;146
7.1.4.3;Connectivity;148
7.1.4.3.1;Flood Filling;149
7.1.4.3.2;Border Following;150
7.1.4.3.3;Dilation and Erosion;151
7.1.4.3.4;Opening and Closing;152
7.1.5;Edge Detection;153
7.1.5.1;Canny;154
7.1.5.2;Hough Transform;156
7.1.5.3;Polygonal Approximation;158
7.1.5.4;Snakes;160
7.1.6;References;162
7.2;Document Image Analysis;163
7.2.1;Document Structures;163
7.2.1.1;Spatial Description;165
7.2.1.1.1;4-Intersection Model;166
7.2.1.1.2;Minimum Bounding Rectangles;168
7.2.1.2;Logical Structure Description;169
7.2.1.2.1;DOM (Document Object Model);169
7.2.2;Pre-processing for Digitized Documents;172
7.2.2.1;Document Image Defect Models;173
7.2.2.2;Deskewing;174
7.2.2.3;Dewarping;175
7.2.2.3.1;Segmentation-Based Dewarping;176
7.2.2.4;Content Identification;178
7.2.2.5;Optical Character Recognition;179
7.2.2.5.1;Tesseract;181
7.2.2.5.2;JTOCR;183
7.2.3;Segmentation;184
7.2.3.1;Classification of Segmentation Techniques;185
7.2.3.2;Pixel-Based Segmentation;187
7.2.3.2.1;RLSA (Run Length Smoothing Algorithm);187
7.2.3.2.2;RLSO (Run-Length Smoothing with OR);189
7.2.3.2.3;X-Y Trees;191
7.2.3.3;Block-Based Segmentation;193
7.2.3.3.1;The DOCSTRUM;193
7.2.3.3.2;The CLiDE (Chemical Literature Data Extraction) Approach;195
7.2.3.3.3;Background Analysis;197
7.2.3.3.4;RLSO on Born-Digital Documents;201
7.2.4;Document Image Understanding;202
7.2.4.1;Relational Approach;204
7.2.4.1.1;INTHELEX (INcremental THEory Learner from EXamples);206
7.2.4.2;Description;208
7.2.4.2.1;DCMI (Dublin Core Metadata Initiative);209
7.2.5;References;211
8;Content Processing;215
8.1;Natural Language Processing;217
8.1.1;Resources-Lexical Taxonomies;218
8.1.1.1;WordNet;219
8.1.1.2;WordNet Domains;220
8.1.1.3;Senso Comune;223
8.1.2;Tools;224
8.1.2.1;Tokenization;225
8.1.2.2;Language Recognition;226
8.1.2.3;Stopword Removal;227
8.1.2.4;Stemming;228
8.1.2.4.1;Suffix Stripping;229
8.1.2.5;Part-of-Speech Tagging;231
8.1.2.5.1;Rule-Based Approach;231
8.1.2.6;Word Sense Disambiguation;233
8.1.2.6.1;Lesk's Algorithm;235
8.1.2.6.2;Yarowsky's Algorithm;235
8.1.2.7;Parsing;236
8.1.2.7.1;Link Grammar;237
8.1.3;References;239
8.2;Information Management;241
8.2.1;Information Retrieval;241
8.2.1.1;Performance Evaluation;242
8.2.1.2;Indexing Techniques;244
8.2.1.2.1;Vector Space Model;244
8.2.1.3;Query Evaluation;247
8.2.1.3.1;Relevance Feedback;248
8.2.1.4;Dimensionality Reduction;249
8.2.1.4.1;Latent Semantic Analysis and Indexing;250
8.2.1.4.2;Concept Indexing;253
8.2.1.5;Image Retrieval;255
8.2.2;Keyword Extraction;257
8.2.2.1;TF-ITP;259
8.2.2.2;Naive Bayes;259
8.2.2.3;Co-occurrence;260
8.2.3;Text Categorization;262
8.2.3.1;A Semantic Approach Based on WordNet Domains;264
8.2.4;Information Extraction;265
8.2.4.1;WHISK;267
8.2.4.2;A Multistrategy Approach;269
8.2.5;The Semantic Web;271
8.2.6;References;272
9;Appendix A A Case Study: DOMINUS;274
9.1;General Framework;274
9.1.1;Actors and Workflow;274
9.1.2;Architecture;276
9.2;Functionality;278
9.2.1;Input Document Normalization;278
9.2.2;Layout Analysis;279
9.2.2.1;Kernel-Based Basic Blocks Grouping;280
9.2.3;Document Image Understanding;281
9.2.4;Categorization, Filing and Indexing;281
9.3;Prototype Implementation;282
9.4;Exploitation for Scientific Conference Management;285
9.4.1;GRAPE;286
10;Appendix B Machine Learning Notions;288
10.1;Categorization of Techniques;288
10.2;Noteworthy Techniques;289
10.2.1;Artificial Neural Networks;289
10.2.2;Decision Trees;290
10.2.3;k-Nearest Neighbor;290
10.2.4;Inductive Logic Programming;290
10.2.5;Naive Bayes;291
10.2.6;Hidden Markov Models;291
10.2.7;Clustering;291
10.3;Experimental Strategies;292
10.3.1;k-Fold Cross-Validation;292
10.3.2;Leave-One-Out;293
10.3.3;Random Split;293
11;Glossary;294
11.1;Bounding box;294
11.2;Byte ordering;294
11.3;Ceiling function;294
11.4;Chunk;294
11.5;Connected component;294
11.6;Heaviside unit function;294
11.7;Heterarchy;295
11.8;KL-divergence;295
11.9;Linear regression;295
11.10;Run;295
11.11;Scanline;295
12;References;296
13;Index;305



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.