Biemann / Mehler | Text Mining | E-Book | www.sack.de
E-Book

E-Book, Englisch, 243 Seiten

Reihe: Theory and Applications of Natural Language Processing

Biemann / Mehler Text Mining

From Ontology Learning to Automated Text Processing Applications
2014
ISBN: 978-3-319-12655-5
Verlag: Springer International Publishing
Format: PDF
Kopierschutz: 1 - PDF Watermark

From Ontology Learning to Automated Text Processing Applications

E-Book, Englisch, 243 Seiten

Reihe: Theory and Applications of Natural Language Processing

ISBN: 978-3-319-12655-5
Verlag: Springer International Publishing
Format: PDF
Kopierschutz: 1 - PDF Watermark



This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects the most recent achievements with respect to the automatic build-up of large lexical resources. It addresses researchers that already perform text mining, and those who want to enrich their battery of methods. Selected articles can be used to support graduate-level teaching.The book is suitable for all readers that completed undergraduate studies of computational linguistics, quantitative linguistics, computer science and computational humanities. It assumes basic knowledge of computer science and corpus processing as well as of statistics.

After completing his doctoral dissertation with Gerhard Heyer at the University of Leipzig (Germany), Chris Biemann joined the semantic search startup Powerset (San Francisco) in 2008, which was acquired to become part of Microsoft's Bing in the same year. In 2011, he joined TU Darmstadt (Germany) as an assistant professor (W1) for Language Technology. His interests are situated in statistical semantics, unsupervised and knowledge-free natural language processing and in leveraging the wisdom of the crowds for language data acquisition. Alexander Mehler is professor (W3) for Computational Humanities / Text Technology at the Goethe University Frankfurt am Main, where he heads the Text Technology Lab as part of the Institute of Informatics. His research interests focus on the empirical analysis and simulative synthesis of discourse units in spoken and written communication. He aims at a quantitative theory of networking in linguistic systems to enable multi-agent simulations of their life cycle. Alexander Mehler integrates models of semantic spaces with simulation models of language evolution and topological models of network theory to capture the complexity of linguistic information systems. Currently, he is heading several research projects on the analysis of linguistic networks in historical semantics. Most recently he started a research project on kinetic text-technologies that integrates the paradigm of games with a purpose with the wiki way of collaborative writing and kinetic HCI.

Biemann / Mehler Text Mining jetzt bestellen!

Weitere Infos & Material


1;Foreword;6
2;List of Reviewers;8
3;Contents;10
4;Part I Text Mining Techniques and Methodologies;12
4.1;Building Large Resources for Text Mining: The Leipzig Corpora Collection;13
4.1.1;1 Introduction: The Need for Large Resources;13
4.1.1.1;1.1 What is the Right Size of a Corpus?;13
4.1.1.2;1.2 How Much Text is There for a Certain Language?;14
4.1.2;2 Standardization and Availability;15
4.1.2.1;2.1 Standardized Processing;15
4.1.2.1.1;2.1.1 Crawling;15
4.1.2.1.2;2.1.2 Pre-processing;17
4.1.2.2;2.2 Standardization in Distributed Infrastructures;18
4.1.3;3 The Leipzig Corpora Collection;19
4.1.3.1;3.1 Evolution of the LCC;19
4.1.3.2;3.2 Deep Processing;21
4.1.3.2.1;3.2.1 Word Co-occurrences;22
4.1.3.2.2;3.2.2 POS Tagging;22
4.1.3.2.3;3.2.3 Word Similarities;22
4.1.3.2.4;3.2.4 Sentence Similarities;23
4.1.3.3;3.3 Language and Corpus Statistics;25
4.1.3.3.1;3.3.1 Quality;25
4.1.3.3.2;3.3.2 Corpus Timeline;27
4.1.3.3.3;3.3.3 Language Description;27
4.1.3.3.4;3.3.4 Application to Typology;29
4.1.3.4;3.4 Multiword Units;31
4.1.3.5;3.5 Recent Developments and Future Trends;32
4.1.4;References;33
4.2;Learning Textologies: Networks of Linked Word Clusters;35
4.2.1;1 Introduction;35
4.2.2;2 Related Work;38
4.2.3;3 Building Textologies;38
4.2.3.1;3.1 Word Association Graph;40
4.2.3.2;3.2 Algorithms;41
4.2.3.2.1;3.2.1 The Cluster Expansion Algorithm;41
4.2.3.2.2;3.2.2 Semantic Context Learning;43
4.2.3.2.3;3.2.3 Link Detection Algorithm;43
4.2.4;4 Using Textologies;44
4.2.4.1;4.1 From Textologies to Ontologies;44
4.2.4.2;4.2 Grammar Generation;45
4.2.5;5 Experiments and Evaluation;46
4.2.5.1;5.1 Building a Textology;46
4.2.5.2;5.2 Generating Grammars;48
4.2.6;6 Conclusion;49
4.2.7;References;49
4.3;Simple, Fast and Accurate Taxonomy Learning;51
4.3.1;1 Introduction;51
4.3.2;2 Related Work;52
4.3.3;3 Taxonomy Term Extraction;53
4.3.3.1;3.1 Hyponym Extraction and Filtering;54
4.3.3.2;3.2 Hypernym Extraction and Filtering;55
4.3.3.3;3.3 Concept Positioning Test;56
4.3.4;4 Taxonomy Induction;56
4.3.4.1;4.1 Positioning Intermediate Concepts;56
4.3.4.2;4.2 Graph-Based Concept Reordering;57
4.3.5;5 Taxonomy Enrichment with Verb-Based Relations;58
4.3.5.1;5.1 Problem Formulation;58
4.3.5.2;5.2 Learning Verb Relations;59
4.3.5.3;5.3 Learning Verb–Preposition Relations;60
4.3.6;6 Data Collection and Experimental Set Up;60
4.3.6.1;6.1 Experiment 1: Hyponym Extraction;61
4.3.6.2;6.2 Experiment 2: Hypernym Extraction;62
4.3.6.3;6.3 Experiment 3: IS-A Taxonomic Relations;63
4.3.6.4;6.4 Experiment 4: Reconstructing WordNet's Taxonomy;63
4.3.6.5;6.5 Experiment 5: Taxonomy Verb-Based Enrichment;66
4.3.7;7 Conclusion;69
4.3.8;References;70
4.4;A Topology-Based Approach to Visualize the Thematic Composition of Document Collections;73
4.4.1;1 Introduction;73
4.4.2;2 Related Work;75
4.4.2.1;2.1 Visualization of High-Dimensional Point Data;75
4.4.2.2;2.2 Representation and Visualization of Textual Data;77
4.4.3;3 Pitfalls of Distance-Based Analysis and Projective Visualization;77
4.4.3.1;3.1 Distance-Based Analysis in High-Dimensional Spaces;78
4.4.3.2;3.2 Projections to Visualize High-Dimensional Clusterings;79
4.4.3.3;3.3 Rethinking: How to Present What to the User;80
4.4.4;4 Topological Representation of Clustering Structure;81
4.4.4.1;4.1 From Point Data to a High-Dimensional Density Function;82
4.4.4.2;4.2 The Topology of the Density Function;83
4.4.4.3;4.3 Cluster Properties and Algorithm Parameters;85
4.4.5;5 Visualization of High-Dimensional Point Cloud Structure;86
4.4.5.1;5.1 Topological Landscape Metaphor;87
4.4.5.1.1;5.1.1 Atoll-Like Flattened Topological Landscape;88
4.4.5.2;5.2 Topological Landscape Profile;88
4.4.5.3;5.3 Feature Selection and Local Data Analysis;90
4.4.5.4;5.4 Parameter Widgets;92
4.4.6;6 Conclusion;93
4.4.7;References;94
4.5;Towards a Network Model of the Coreness of Texts: An Experiment in Classifying Latin Texts Using the TTLab Latin Tagger;96
4.5.1;1 Introduction;96
4.5.2;2 Processing Latin Texts with the TTLab Latin Tagger;98
4.5.2.1;2.1 Linguistic Rules;99
4.5.2.2;2.2 Statistical PoS-Tagging with Conditional Random Fields;101
4.5.2.3;2.3 Evaluation;102
4.5.3;3 Extending the Frankfurt Latin Lexicon (FLL);103
4.5.4;4 From Tagging Latin Texts to Lexical Text Networks;104
4.5.4.1;4.1 Approaching Lexical Text Structures by Means of k-Cores;105
4.5.5;5 Experimentation;110
4.5.6;6 Conclusion;118
4.5.7;References;118
5;Part II Text Mining Applications;122
5.1;A Structuralist Approach for Personal Knowledge Exploration Systems on Mobile Devices;123
5.1.1;1 Introduction;123
5.1.1.1;1.1 The Structuralist Approach and Personal Data;125
5.1.1.2;1.2 Mobile Devices;127
5.1.2;2 Our Solution;127
5.1.2.1;2.1 Pledge for Additional ``Language'' Layers;128
5.1.3;3 Text Similarity Measurement;129
5.1.3.1;3.1 Evaluation Method;131
5.1.3.2;3.2 Experiments on News and Email Text Collections;132
5.1.3.3;3.3 (Unofficial) Semantic Text Similarity Experiments;134
5.1.4;4 Information Extraction;135
5.1.4.1;4.1 (Named) Entity Recognition;136
5.1.4.2;4.2 Integration of Personal Resources;137
5.1.4.2.1;4.2.1 Address Book;138
5.1.4.2.2;4.2.2 Exploiting the Personal Corpus;138
5.1.4.2.3;4.2.3 Combining Precomputed NER Models with Personal Models;139
5.1.5;5 Conclusions;140
5.1.6;References;142
5.2;Natural Language Processing Supporting Interoperabilityin Healthcare;145
5.2.1;1 Introduction;145
5.2.2;2 Methods;146
5.2.2.1;2.1 The Interoperability Challenge;147
5.2.2.2;2.2 The Language Challenge;150
5.2.2.3;2.3 The Natural Language Processing Challenge;152
5.2.3;3 Towards NLP in Healthcare;153
5.2.3.1;3.1 Round Trip;154
5.2.3.2;3.2 Phases of NLP;154
5.2.3.3;3.3 From Speech to Text Fields;155
5.2.3.4;3.4 From Text to Codes;156
5.2.3.5;3.5 From Codes to Structured Data;157
5.2.3.6;3.6 Importing Data;159
5.2.3.7;3.7 Data Exchange;160
5.2.3.8;3.8 Semantic Translations;161
5.2.4;4 Discussion;162
5.2.5;References;162
5.3;Deception Detection Within and Across Cultures;165
5.3.1;1 Introduction;165
5.3.1.1;1.1 Related Work;166
5.3.2;2 Datasets;167
5.3.3;3 Experiments;170
5.3.3.1;3.1 What is the Performance for Deception Classifiers Built for Different Cultures?;170
5.3.3.2;3.2 Can We Use Information Drawn from One Culture to Build a Deception Classifier in Another Culture?;173
5.3.3.3;3.3 What are the Psycholinguistic Classes Most Strongly Associated with Deception/Truth?;174
5.3.4;4 Deception Detection Using Short Sentences;178
5.3.5;5 Conclusions;182
5.3.6;References;182
5.4;Sentiment Analysis: What's Your Opinion?;184
5.4.1;1 Introduction;184
5.4.2;2 The Counterpart of Sentiment in Linguistics and Psychology;186
5.4.2.1;2.1 Subjectivity;186
5.4.2.1.1;2.1.1 The `Private State';187
5.4.2.1.2;2.1.2 Emotions and Their Reflection in Language;188
5.4.2.1.3;2.1.3 Intersubjectivity;190
5.4.2.2;2.2 Factuality;191
5.4.2.2.1;2.2.1 The Semantic Viewpoint: Evidentiality and Veridicity;191
5.4.2.2.2;2.2.2 Interpretation;191
5.4.3;3 Sentiment Analysis in Computational Linguistics;192
5.4.3.1;3.1 Resources: Lexicons and Corpora;193
5.4.3.2;3.2 Rule-Based Approaches;195
5.4.3.3;3.3 Aspect Analysis;196
5.4.3.4;3.4 Machine Learning Approaches;197
5.4.4;4 What Is Your Opinion, What Is Ours?;199
5.4.4.1;4.1 Terminology;199
5.4.4.2;4.2 Issues (1): Polarity and Lexicons;200
5.4.4.3;4.3 Issues (2): Context;201
5.4.5;5 Summary;204
5.4.6;References;205
5.5;Multi-perspective Event Detection in Texts Documentingthe 1944 Battle of Arnhem;207
5.5.1;1 Introduction;207
5.5.2;2 Synthesizing Computational and Historical Research Practices;209
5.5.3;3 About MERIT;212
5.5.3.1;3.1 Proof of Concept Study: The Battle of Arnhem;212
5.5.3.2;3.2 Methodology;214
5.5.4;4 A Pilot Study;215
5.5.4.1;4.1 Step 1: Text Selection;215
5.5.4.2;4.2 Step 2a: Named Entity Recognition;216
5.5.4.3;4.3 Step 2b: Regular Expressions for Street Names;217
5.5.4.4;4.4 Step 3: Visualization of Relations Between Texts;219
5.5.5;5 Step 4: Information Processing;220
5.5.6;6 Conclusion;223
5.5.7;References;223
5.6;Towards a Historical Text Re-use Detection;226
5.6.1;1 Introduction;227
5.6.2;2 Data: Investigated Corpus and Initial Setup;229
5.6.3;3 Related Work;229
5.6.4;4 Algorithms: Text Re-use Techniques;230
5.6.5;5 Initial Setup;233
5.6.6;6 Results;234
5.6.6.1;6.1 Evaluation of Text Re-use Techniques for Paraphrase Detection;234
5.6.6.2;6.2 Extraction and Typing of Paradigmatic Relations;238
5.6.7;7 Further Work;240
5.6.8;8 Conclusion;240
5.6.9;References;241



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.