E-Book, Englisch, 490 Seiten
Frishman / Valencia Modern Genome Annotation
1. Auflage 2009
ISBN: 978-3-211-75123-7
Verlag: Springer Vienna
Format: PDF
Kopierschutz: 1 - PDF Watermark
The Biosapiens Network
E-Book, Englisch, 490 Seiten
ISBN: 978-3-211-75123-7
Verlag: Springer Vienna
Format: PDF
Kopierschutz: 1 - PDF Watermark
An accurate description of current scientific developments in the field of bioinformatics and computational implementation is presented by research of the BioSapiens Network of Excellence. Bioinformatics is essential for annotating the structure and function of genes, proteins and the analysis of complete genomes and to molecular biology and biochemistry. Included is an overview of bioinformatics, the full spectrum of genome annotation approaches including; genome analysis and gene prediction, gene regulation analysis and expression, genome variation and QTL analysis, large scale protein annotation of function and structure, annotation and prediction of protein interactions, and the organization and annotation of molecular networks and biochemical pathways. Also covered is a technical framework to organize and represent genome data using the DAS technology and work in the annotation of two large genomic sets: HIV/HCV viral genomes and splicing alternatives potentially encoded in 1% of the human genome.
Autoren/Hrsg.
Weitere Infos & Material
1;CONTENTS;6
2;INTRODUCTION BIOSAPIENS: A European Network of Excellence to develop genome annotation resources;18
3;SECTION 1 Gene defintion;21
3.1;CHAPTER 1.1 State of the art in eukaryotic gene prediction;22
3.1.1;1 Introduction;22
3.1.2;2 Classes of information;25
3.1.3;3 Frameworks for integration of information;32
3.1.4;4 Training;41
3.1.5;5 Evaluation of gene prediction methods;42
3.1.6;6 Discussion;47
3.2;CHAPTER 1.2 Quality control of gene predictions;55
3.2.1;1 Introduction;55
3.2.2;2 Quality control of gene predictions;56
3.2.3;3 Results;58
3.2.4;4 Alternative interpretations of the results of MisPred analyses;64
3.2.5;5 Conclusions;66
4;SECTION 2 Gene regulation and expression;67
4.1;CHAPTER 2.1 Evaluating the prediction of cis-acting regulatory elements in genome sequences;68
4.1.1;1 Introduction;68
4.1.2;2 Transcription factor binding sites and motifs;71
4.1.3;3 Scanning a sequence with a position-specific scoring matrix;72
4.1.4;4 Evaluating pattern matching results;77
4.1.5;5 Discovering motifs in promoter sequences;82
4.1.6;6 Methodological issues for evaluating pattern discovery;96
4.1.7;7 Good practices for evaluating predictive tools;97
4.1.8;8 What has not been covered in this chapter;99
4.1.9;9 Materials;100
4.1.10;Abbreviations;100
4.2;CHAPTER 2.2 A biophysical approach to large- scale protein- DNA binding data;103
4.2.1;1 Binding site predictions;104
4.2.2;2 Affinity model {XE “affinity model, TRAP”};107
4.2.3;3 Affinity statistics {XE “affinity statistics”};111
4.2.4;4 Applications;113
4.2.5;5 Summary;114
4.3;CHAPTER 2.3 From gene expression profiling to gene regulation;116
4.3.1;1 Introduction;116
4.3.2;2 Generating sets of co-expressed genes;117
4.3.3;3 Finding putative regulatory regions using comparative genomics;120
4.3.4;4 Detecting common transcription factors for co- expressed gene sets;122
4.3.5;5 Combining transcription factor information;125
4.3.6;6 “De novo” prediction of transcription factor binding motifs;126
5;SECTION 3 Annotation and genetics;131
5.1;CHAPTER 3 Annotation, genetics and transcriptomics;132
5.1.1;1 Introduction;132
5.1.2;2 Genetics and gene function;134
5.1.3;3 Use of animal models;137
5.1.4;4 Transcriptomics: gene expression microarrays;139
5.1.5;5 Gene annotation;141
6;SECTION 4 Functional annotation of proteins;146
6.1;CHAPTER 4.1 Resources for functional annotation;147
6.1.1;1 Introduction;147
6.1.2;2 Resources for functional annotation – protein sequence databases;148
6.1.3;3 UniProt – The Universal Protein Resource;149
6.1.4;4 The UniProt Knowledgebase (UniProtKB);150
6.1.5;5 Protein family classification for functional annotation;160
6.1.6;6 From genes and proteins to genomes and proteomes;168
6.1.7;7 Summary;169
6.2;CHAPTER 4.2 Annotating bacterial genomes;173
6.2.1;1 Background;173
6.2.2;2 Global sequence properties;178
6.2.3;3 Identifying genomic objects;180
6.2.4;4 Functional annotation;182
6.2.5;5 A recursive view of genome annotation;184
6.2.6;6 Improving annotation: parallel analysis and comparison of multiple bacterial genomes;186
6.2.7;7 Perspectives: new developments for the construction of genome databases, metagenome analyses and user- friendly platforms;188
6.2.8;8 Annex: databases and platforms for annotating bacterial genomes;190
6.3;CHAPTER 4.3 Data mining in genome annotation;199
6.3.1;1 Introduction;199
6.3.2;2 An overview of large biological databases;201
6.3.3;3 Data mining in genome annotation;208
6.3.4;4 Applying association rule mining to the Swiss-Prot database;213
6.3.5;5 Applying association rule mining to the PEDANT database;215
6.3.6;6 Conclusion;218
6.4;CHAPTER 4.4 Modern genome annotation: the BioSapiens network;221
6.4.1;1 Homologous and non-homologous sequence methods for assigning protein functions;221
6.5;CHAPTER 4.5 Structure to function;247
6.5.1;1 Introduction to protein structure and function;247
6.5.2;2 FireDB and firestar – the prediction of functionally important residues;249
6.5.3;3 Modelling local function conservation in sequence and structure space for predicting molecular function;254
6.5.4;4 Structural templates for functional characterization;257
6.5.5;5 An integrated pipeline for functional prediction;260
6.6;CHAPTER 4.6 Harvesting the information from a family of proteins;271
6.6.1;1 Introduction;271
6.6.2;2 Molecular class-specific information systems;273
6.6.3;3 Extracting information from sequences;275
6.6.4;4 Correlation studies on GPCRs;277
6.6.5;5 Discussion;282
7;SECTION 5 Protein structure prediction;288
7.1;CHAPTER 5.1 Structure prediction of globular proteins;289
7.1.1;1 The folding problem;289
7.1.2;2 The evolution of protein structures and its implications for protein structure prediction;292
7.1.3;3 Template based modelling;293
7.1.4;4 Template-free protein structure prediction;299
7.1.5;5 Automated structure prediction;306
7.1.6;6 Conclusions and future outlook;310
7.2;CHAPTER 5.2 The state of the art ofmembrane protein structure prediction: from sequence to 3D structure;314
7.2.1;1 Why membrane proteins?;314
7.2.2;2 Many functions;316
7.2.3;3 Bioinformatics and membrane proteins: is it feasible to predict the 3D structure of a membrane protein?;316
7.2.4;4 Predicting the topology of membrane proteins;317
7.2.5;5 How many methods to predict membrane protein topology?;319
7.2.6;6 Benchmarking the predictors of transmembrane topology;321
7.2.7;7 How many membrane proteins in the Human genome?;324
7.2.8;8 Membrane proteins and genetic diseases: PhD-SNP at work;325
7.2.9;9 Last but not least: 3D MODELLING of membrane proteins;327
7.2.10;10 What can currently be done in practice?;328
7.2.11;11 Can we improve?;329
8;SECTION 6 Protein– protein complexes, pathways and networks;332
8.1;CHAPTER 6.1 Computational analysis of metabolic networks;333
8.1.1;1 Introduction;333
8.1.2;2 Computational ressources on metabolism;335
8.1.3;3 Basic notions of graph theory;339
8.1.4;4 Topological analysis of metabolic networks;340
8.1.5;5 Assessing reconstructed metabolic networks against physiological data;346
8.1.6;Conclusion;352
8.2;CHAPTER 6.2 Protein– protein interactions: analysis and prediction;356
8.2.1;1 Introduction;356
8.2.2;2 Experimental methods;357
8.2.3;3 Protein interaction databases;359
8.2.4;4 Data standards for molecular interactions;359
8.2.5;5 The IntAct molecular interaction database;363
8.2.6;6 Interaction networks;365
8.2.7;7 Visualization software for molecular networks;368
8.2.8;8 Estimates of the number of protein interactions;374
8.2.9;9 Multi-protein complexes;375
8.2.10;10 Network modules;376
8.2.11;11 Diseases and protein interaction networks;379
8.2.12;12 Sequence-based prediction of protein interactions;383
8.2.13;13 Integration of experimentally determined and predicted interactions;388
8.2.14;14 Domain–domain interactions;392
8.2.15;15 Biomolecular docking;398
9;SECTION 7 Infrastructure for distributed protein annotation;414
9.1;CHAPTER 7 Infrastructure for distributed protein annotation;415
9.1.1;1 Introduction;415
9.1.2;2 The Distributed Annotation System (DAS);417
9.1.3;3 DAS infrastructure;417
9.1.4;4 The protein feature ontology;424
9.1.5;5 Conclusion;427
10;SECTION 8 Applications;429
10.1;CHAPTER 8.1 Viral bioinformatics;430
10.1.1;1 Introduction;430
10.1.2;2 Viral evolution in the human population;431
10.1.3;3 Interaction between the virus and the human immune system;435
10.1.4;4 Viral evolution in the human host;443
10.1.5;5 Perspectives;451
10.2;CHAPTER 8.2 Alternative splicing in the ENCODE protein complement;454
10.2.1;1 Introduction;454
10.2.2;2 Prediction of variant location;456
10.2.3;3 Prediction of variant function – analysis of the role of alternative splicing in changing function by modulation of functional residues;459
10.2.4;4 Prediction of variant structure;464
10.2.5;5 Summary of effects of alternative splicing;468
10.2.6;6 Prediction of principal isoforms;473
10.2.7;7 The ENCODE pipeline – an automated workflow for analysis of human splice isoforms;478
11;CONTRIBUTORS;486
CHAPTER 3 Annotation, genetics and transcriptomics (p. 123-124)
R. Mott
Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
1 Introduction
This chapter discusses how to combine genome annotations of the type described elsewhere in this book with genetic and functional genomics data to find the genes associatedwith a phenotype, and in particular with a complex disease. This problemis of fundamental importance, the promise that understanding the molecular basis of common diseases would lead to effective treatments helped motivate and fund the human genome project.
Complex diseases such as cancer, diabetes, cardiovascular disease and depression are defined as conditions with multiple causes, both genetic (due to mutations in the genome) and environmental (everything else). By contrast, a Mendelian disease is caused by mutations in a single gene, with minimal environmental contribution. With a few exceptions such as cystic fibrosis in Caucasians and sickle-cell anaemia in parts of equatorial Africa, most Mendelian diseases are rare and do not impose a major health care burden on society. Most common diseases are complex, the exceptions being caused by infectious agents such as HIV and tuberculosis, and even in these cases there is a genetic contribution to resistance to infection.
In general, most complex diseases have a significant genetic component which we can estimate by examining the co-prevalence of a disease in genetically identical (monozygotic) twins compared to non-identical (dizygotic) twins, who only share 50% of their DNA by descent. Because the average effect due to shared environment should be the same in the two groups, any excess in co-prevalence is likely to be genetic. Thus it is possible to estimate the extent of the genetic contribution to a disease without identifying the causative genes and polymorphisms (Mather and Jinks 1982).
The ultimate aim of gene annotation is to describe the function of every segment of the genome, including protein coding genes as well as micro-RNAs, transcription-factor binding sites and other cryptic functional elements. In addition we want to annotate the functional consequence of every polymorphism observed in a population. If we had a perfectly annotated genome then we could predict which genes are relevant to each disease, and there would be no need for further work. However, in fact we have only begun to scratch the surface of the annotation problem, and we will need to be able to integrate data from multiple sources in order to make progress.
Before going further it is important to clarify what is meant by the phrase “gene function”. This turns out to be a surprisingly difficult concept, depending on the context in which the question is being asked. Gene function may be defined at a number of levels. For example, for protein-coding genes, it is important to know in which tissues and at which developmental stages the protein is expressed, and in which splice variants or isoforms. Next, the interactants of the protein are important, as they define the pathways in which the protein functions. Finally we wish to understand the consequences of perturbations to the gene`s DNA sequence, as these may give rise to genetic disease.




