Buch, Englisch, 348 Seiten, Format (B × H): 161 mm x 240 mm, Gewicht: 690 g
Buch, Englisch, 348 Seiten, Format (B × H): 161 mm x 240 mm, Gewicht: 690 g
ISBN: 978-0-8493-2801-5
Verlag: CRC Press
Covering theory, algorithms, and methodologies, as well as data mining technologies, Data Mining for Bioinformatics provides a comprehensive discussion of data-intensive computations used in data mining with applications in bioinformatics. It supplies a broad, yet in-depth, overview of the application domains of data mining for bioinformatics to help readers from both biology and computer science backgrounds gain an enhanced understanding of this cross-disciplinary field.
The book offers authoritative coverage of data mining techniques, technologies, and frameworks used for storing, analyzing, and extracting knowledge from large databases in the bioinformatics domains, including genomics and proteomics. It begins by describing the evolution of bioinformatics and highlighting the challenges that can be addressed using data mining techniques. Introducing the various data mining techniques that can be employed in biological databases, the text is organized into four sections:
- Supplies a complete overview of the evolution of the field and its intersection with computational learning
- Describes the role of data mining in analyzing large biological databases—explaining the breath of the various feature selection and feature extraction techniques that data mining has to offer
- Focuses on concepts of unsupervised learning using clustering techniques and its application to large biological data
- Covers supervised learning using classification techniques most commonly used in bioinformatics—addressing the need for validation and benchmarking of inferences derived using either clustering or classification
The book describes the various biological databases prominently referred to in bioinformatics and includes a detailed list of the applications of advanced clustering algorithms used in bioinformatics. Highlighting the challenges encountered during the application of classification on biological databases, it considers systems of both single and ensemble classifiers and shares effort-saving tips for model selection and performance estimation strategies.
Zielgruppe
bioinformatics software engineers/developers; bioinformatician; bioinformatics scientists and support specialists; data mining analysts/architects; data mining engineers and support specialists; faculty; post doctoral researchers
Autoren/Hrsg.
Fachgebiete
Weitere Infos & Material
Introduction to Bioinformatics
Introduction
Transcription and Translation
The Central Dogma of Molecular Biology
The Human Genome Project
Beyond the Human Genome Project
Sequencing Technology
Dideoxy Sequencing
Cyclic Array Sequencing
Sequencing by Hybridization
Microelectrophoresis
Mass Spectrometry
Nanopore Sequencing
Next-Generation Sequencing
Challenges of Handling NGS Data
Sequence Variation Studies
Kinds of Genomic Variations
SNP Characterization
Functional Genomics
Splicing and Alternative Splicing
Microarray-Based Functional Genomics
Comparative Genomics
Functional Annotation
Function Prediction Aspects
Conclusion
References
Biological Databases and Integration
Introduction: Scientific Work Flows and Knowledge Discovery
Biological Data Storage and Analysis
Challenges of Biological Data
Classification of Bioscience Databases
Primary versus Secondary Databases
Deep versus Broad Databases
Point Solution versus General Solution Databases
Gene Expression Omnibus (GEO) Database
The Protein Data Bank (PDB)
The Curse of Dimensionality
Data Cleaning
Problems of Data Cleaning
Challenges of Handling Evolving Databases
Problems Associated with Single-Source Techniques
Problems Associated with Multisource Integration
Data Argumentation: Cleaning at the Schema Level
Knowledge-Based Framework: Cleaning at the Instance Level
Data Integration
Ensembl
Sequence Retrieval System (SRS)
IBM’s DiscoveryLink
Wrappers: Customizable Database Software
Data Warehousing: Data Management with Query Optimization
Data Integration in the PDB
Conclusion
References
Knowledge Discovery in Databases
Introduction
Analysis of Data Using Large Databases
Distance Metrics
Data Cleaning and Data Preprocessing
Challenges in Data Cleaning
Models of Data Cleaning
Proximity-Based Techniques
Parametric Methods
Nonparametric Methods
Semiparametric Methods