E-Book, Englisch, 362 Seiten
Cimiano Ontology Learning and Population from Text
1. Auflage 2006
ISBN: 978-0-387-39252-3
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Algorithms, Evaluation and Applications
E-Book, Englisch, 362 Seiten
ISBN: 978-0-387-39252-3
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
In the last decade, ontologies have received much attention within computer science and related disciplines, most often as the semantic web. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications discusses ontologies for the semantic web, as well as knowledge management, information retrieval, text clustering and classification, as well as natural language processing. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications is structured for research scientists and practitioners in industry. This book is also suitable for graduate-level students in computer science.
Autoren/Hrsg.
Weitere Infos & Material
1;Contents;7
2;List of Figures;11
3;List of Tables;13
4;Foreword;15
5;Preface;17
6;Acknowledgements;21
7;Abbreviations;25
8;Mathematical Notation;27
9;Part I Preliminaries;29
9.1;Introduction;30
9.2;Ontologies;35
9.3;Ontology Learning from Text;44
9.3.1;3.1 Ontology Learning Tasks;48
9.4;Basics;60
9.4.1;4.1 Natural Language Processing;60
9.4.2;4.2 Formal Concept Analysis;81
9.4.3;4.3 Machine Learning;87
9.5;Datasets;101
9.5.1;5.1 Corpora;101
9.5.2;5.2 Concept Hierarchies;103
9.5.3;5.3 Population Gold Standard;105
10;Part II Methods and Applications;106
10.1;Concept Hierarchy Induction;107
10.1.1;6.1 Common Approaches;108
10.1.2;6.2 Learning Concept Hierarchies with FCA;116
10.1.3;6.3 Guided Clustering;145
10.1.4;6.4 Learning from Heterogeneous Sources of Evidence;164
10.1.5;6.5 Related Work;178
10.1.6;6.6 Conclusion and Open Issues;204
10.2;Learning Attributes and Relations;207
10.2.1;7.1 Common Approaches;207
10.2.2;7.2 Learning Attributes;210
10.2.3;7.3 Learning Relations from Corpora;221
10.2.4;7.4 Learning Qualia Structures from the Web;229
10.2.5;7.5 Related Work;244
10.2.6;7.6 Conclusion and Open Issues;252
10.3;Population;254
10.3.1;8.1 Common Approaches;255
10.3.2;8.2 Corpus-based Population;259
10.3.3;8.3 Learning by Googling;270
10.3.4;8.4 Related Work;295
10.3.5;8.5 Conclusion and Open Issues;300
10.4;Applications;302
10.4.1;9.1 Text Clustering and Classification;304
10.4.2;9.2 Information Highlighting for Supporting Search;313
10.4.3;9.3 Related Work;320
10.4.4;9.4 Contribution and Open Issues;325
11;Part III Conclusion;327
11.1;Contribution and Outlook;328
11.2;Concluding Remarks;330
12;Appendix;332
12.1;A. l Learning Accuracy;332
12.2;A.2 Mutually Similar Words for the tourism domain;336
12.3;A.3 Mutually Similar Words for the finance domain;337
12.4;A.4 The Penn Treebank Tag Set;339
13;References;340
14;Index;363
10 Contribution and Outlook (p. 309-310)
This book contributes to the state-of-the-art in ontology learning in several ways. First, we have provided a formal definition of ontology learning tasks with respect to a well-defined ontology model. The ontology learning layer cake, a model for representing the diverse subtasks in ontology learning has been introduced. In addition, evaluation measures for the concept hierarchy induction, relation learning as well as ontology population tasks have been defined. These evaluation measures provide a basis in order to compare different approaches performing a certain task. Most importantly, several original and novel approaches performing a certain task have been presented and compared to other state-of-the-art approaches from the literature using the defined evaluation measures.
Concerning the concept hierarchy induction task, we have presented a novel approach based on Formal Concept Analysis, an original guided agglomerative clustering method as well as a combination approach for the induction of concept hierarchies from text. All the approaches have been evaluated and have been demonstrated to actually outperform current state-of-the-art methods. We have further introduced and discussed several approaches to learning attributes and relations. In particular, we have presented approaches to learn i) attributes, ii) the appropriate domain and range for relations, as well as iii) specific relations using a pattern-based approach. Several approaches to automatically populate an ontology with instances have also been described. We have in particular examined a similarity-based approach as well as introduced the original approach of Learning By Googling. Corresponding evaluations have also been provided. Finally, we have have also discussed applications for ontology learning approaches and demonstrated for two concrete applications that the techniques developed in the context of this book are indeed useful. Throughout the book, we have also provided a thorough overview of related work.
Fortunately, there are a number of open issues which require further research. On the one hand, though we have undertaken a first step towards combining different ontology learning paradigms via a machine-learning approach. further research is needed in this direction to unveil the full potential of such a combination. In particular, other paradigms than our classification-based approach could be explored. One could imagine to train classifiers for each type of basic ontological relation, i.e. isa, part-of, etc. using different methods and then use a calculus as envisioned by Heyer and colleagues [Heyer et al., 2001] as well as Ogata and Collier [Ogata and Collier, 2004] to combine the results of these classifiers and reason on different types of extracted ontological relations. Such a post-extraction reasoning is in fact crucial as the different approaches can produce contradicting information and thus producing a consistent ontology needs some kind of contradiction resolution approach. In fact, one important problem is to generate the optimal ontology maximizing a certain criterion given a certain amount of - possibly contradicting - relations. Initial blueprints for such an approach can be found, for example, in the work of Haase and Volker [Haase and Volker, 2005]. A lot of further research is however needed in this direction.
Another important issue to be clarified is which similarity measures, which weighting measures and which features work best for the task of clustering words. Though we have provided some insights in the present book, much more work is needed to clarify these issues. In the same vein, further experiments are necessary to clarify the relation between syntactic and semantic similarity such as perceived by humans. These issues can only be approached from an experimental perspective. Though there has been a lot of work on this issue, much further research can be expected.
In general, from a theoretical perspective, it would be necessary to clarify what type of ontologies we can actually learn, i.e. domain ontologies, lexical ontologies, upper-level ontologies, application ontologies, etc. Work in this direction has been presented by Bateman [Bateman, 1991], for instance. In this line, it seems also necessary to ask ourselves about the limits of ontology learning techniques. Furthermore, an integration of ontology learning techniques with linguistic theories, in particular with lexicon theories such as Generative Lexicon [Pustejovsky, 1991] is definitely desirable. In addition, it seems desirable to clarify the relation between ontological and lexical semantics. In the long term, it would definitely be interesting to acquire more complex relationships between concepts and relations in the form of rules or axioms. Last but not least, approaches should actually have reasonable applications. We have argued that it is far from straightforward to devise applications making use of automatically learned ontologies in a reasonable way. The problem lies in the fact that there are a number of parameters to be tuned on which the success of using an ontology depends. However, the quest for applications is a necessary and crucial one. Future research should thus further examine the usefulness of automatically derived knowledge structures for certain applications.




