E-Book, Englisch, 214 Seiten
Lin / Johnson Methods of Microarray Data Analysis II
1. Auflage 2007
ISBN: 978-0-306-47598-6
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Papers from CAMDA '01
E-Book, Englisch, 214 Seiten
ISBN: 978-0-306-47598-6
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Microarray technology is a major experimental tool for functional genomic explorations, and will continue to be a major tool throughout this decade and beyond. The recent explosion of this technology threatens to overwhelm the scientific community with massive quantities of data. Because microarray data analysis is an emerging field, very few analytical models currently exist. Methods of Microarray Data Analysis II is the second book in this pioneering series dedicated to this exciting new field. In a single reference, readers can learn about the most up-to-date methods, ranging from data normalization, feature selection, and discriminative analysis to machine learning techniques.
Currently, there are no standard procedures for the design and analysis of microarray experiments. Methods of Microarray Data Analysis II focuses on a single data set, using a different method of analysis in each chapter. Real examples expose the strengths and weaknesses of each method for a given situation, aimed at helping readers choose appropriate protocols and utilize them for their own data set. In addition, web links are provided to the programs and tools discussed in several chapters. This book is an excellent reference not only for academic and industrial researchers, but also for core bioinformatics/genomics courses in undergraduate and graduate programs.
Written for: Academic and industrial researchers
Autoren/Hrsg.
Weitere Infos & Material
1;Contents;5
2;Contributors;7
3;Acknowledgements;9
4;Preface;10
5;INTRODUCTION;11
6;CAMDA 2001 Data Sets;12
7;Feature Selection and Extraction;12
8;Clustering Strategies;13
9;Modeling Complex Systems;14
10;Ontologies, Semantic Understanding, and Functional Genomics;15
11;A standard protocol?;16
12;Web Companion;17
13;1 AN INTRODUCTION TO DNA MICROARRAYS;18
13.1;1. INTRODUCTION TO FUNCTIONAL GENOMICS;18
13.2;2. MICROARRAY TECHNOLOGY;19
13.3;3. MICROARRAY DATA;22
13.4;4. MICROARRAY EXPERIMENT GOALS;23
13.5;5. MICROARRAY EXPERIMENTAL DESIGN;25
13.6;6. MICROARRAY DATA ANALYSIS;26
13.7;7. RESULT VALIDATION;27
13.7.1;7.1 Sample and Data Triage;27
13.7.2;7.2 Statistical Validation;27
13.7.3;7.3 Biological Validation;28
13.8;8. CONCLUSION;28
13.9;REFERENCES;29
14;2 EXPERIMENTAL DESIGN FOR GENE MICROARRAY EXPERIMENTS AND DIFFERENTIAL EXPRESSION ANALYSIS;31
14.1;1. INTRODUCTION;31
14.2;2. DESIGN OF MICROARRAY EXPERIMENTS;32
14.2.1;2.1 Biological variation;33
14.2.2;2.2 Technological variations;34
14.2.3;2.3 Microarray quality checklist;36
14.3;3. EXPERIMENTAL DESIGNS THAT INCORPORATE BIOLOGICAL AND TECHNOLOGICAL VARIATION;37
14.3.1;3.1 Block designs;37
14.3.2;3.2 Randomization;38
14.3.3;3.3 Loop designs;38
14.3.4;3.4 Split plot designs;39
14.3.5;3.5 Optimal designs;40
14.4;4. DESIGN OF MICROARRAYS;41
14.5;5. NORMALIZATION MODELS;42
14.5.1;5.1 Data transformation and background removal;42
14.5.2;5.2 Linear vs. non-linear effects;43
14.5.3;5.3 Random vs. fixed effects;43
14.5.4;5.4 Ordinary least squares vs. orthogonal regression;43
14.5.5;5.5 Means vs. medians;43
14.5.6;5.6 Self-consistency;44
14.5.7;5.7 Flagging outliers;44
14.6;6. DIFFERENTIAL EXPRESSION;44
14.6.1;6.1 Error models;45
14.6.2;6.2 Bayesian approach;45
14.6.3;6.3 Adjustment for multiple comparisons and power considerations;46
14.7;7. FINAL REMARKS;46
14.8;ACKNOWLEDGEMENTS;47
14.9;REFERENCES;47
15;3 MICROARRAY DATA PROCESSING AND ANALYSIS;50
15.1;1. INTRODUCTION;50
15.2;2. DESIGN OF THE ARRAY;51
15.3;3. DATA ACQUISITION AND IMAGE ANALYSIS;53
15.4;4. NORMALISATION AND FILTERING;54
15.5;5. DATA STORAGE;55
15.6;6. ADDRESSING BIOLOGICAL QUESTIONS;57
15.7;7. DATA ANALYSIS;58
15.7.1;7.1 Two conditions comparison;58
15.7.2;7.2 Multiple conditions comparison.;58
15.7.3;7.3 Gene networks;64
15.8;8. CONCLUSIONS AND FUTURE PROSPECTS;65
15.9;REFERENCES;66
16;4 BIOLOGY-DRIVEN CLUSTERING OF MICROARRAY DATA;71
16.1;1. INTRODUCTION;71
16.2;2. THE ANNOTATION PROBLEM;72
16.2.1;2.1 Reannotating the Spots;72
16.2.2;2.2 Finding Functional Categories;73
16.3;3. PRELIMINARY ANALYSIS;76
16.3.1;3.1 Data Preprocessing;76
16.3.2;3.2 Updating Cell Line Classifications;78
16.3.3;3.3 Choosing a Distance Metric;79
16.4;4. CHROMOSOMAL CLUSTERING;80
16.5;5. FUNCTIONAL CLUSTERING;82
16.6;6. CONCLUSIONS;83
16.7;ACKNOWLEDGEMENTS;85
16.8;REFERENCES;85
17;5 EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES;86
17.1;1. INTRODUCTION;86
17.2;2. CLUSTERING WITH NORMALIZED CUT;88
17.2.1;2.1 The NCut Criterion;88
17.2.2;2.2 K-Way Partitioning;89
17.2.3;2.3 Clustering Large Datasets;90
17.3;3. RESULTS;90
17.4;4. CONCLUSIONS;94
17.5;REFERENCES;94
18;6 SUPERVISED NEURAL NETWORKS FOR CLUSTERING CONDITIONS IN DNA ARRAY DATA AFTER REDUCING NOISE BY CLUSTERING GENE EXPRESSION PROFILES;96
18.1;1. INTRODUCTION;96
18.2;2. COMPARATIVE PERFORMANCES OF CLUSTERING METHODS;98
18.2.1;2.1 Data set used;98
18.2.2;2.2 Comparative runtimes;98
18.2.3;2.3 Comparative accuracy;100
18.2.4;2.4 Conclusions on comparative performances;101
18.3;3. CLUSTERING OF CONDITIONS;102
18.3.1;3.1 The problem of noisy patterns;102
18.3.2;3.2 Clustering of conditions and noise reduction;103
18.4;4. CONCLUSIONS;107
18.5;ACKNOWLEDGEMENTS;107
18.6;REFERENCES;107
19;7 BAYESIAN DECOMPOSITION ANALYSIS OF GENE EXPRESSION IN YEAST DELETION MUTANTS;109
19.1;1. INTRODUCTION;110
19.1.1;1.1 The Development of Cancer;110
19.1.2;1.2 Microarray Measurements and Analysis;110
19.2;2. METHODS;112
19.2.1;2.1 Bayesian Decomposition;112
19.2.2;2.2 Issues in the Application of Bayesian Decomposition;117
19.2.3;2.3 Application to the Rosetta Compendium;117
19.3;3. RESULTS;119
19.3.1;3.1 Identification of the Patterns;119
19.3.2;3.2 Validation of a Pattern;120
19.4;4. CONCLUSIONS;121
19.5;ACKNOWLEDGEMENTS;123
19.6;REFERENCES;123
20;8 USING FUNCTIONAL GENOMIC UNITS TO CORROBORATE USER EXPERIMENTS WITH THE ROSETTA COMPENDIUM;127
20.1;1. INTRODUCTION;128
20.2;2. METHODS;129
20.2.1;2.1 GO Browser;129
20.2.2;2.2 ICA Model of the DNA Microarray;129
20.2.3;2.3 Profiling the yeast cells transfected with constitutive active human Rac1 gene;131
20.3;3. RESULTS;131
20.3.1;3.1 GO mapping of yeast genes;131
20.3.2;3.2 ICA Results;133
20.3.3;3.3 Using the Rosetta data set to corroborate the Rac1 Experiment;138
20.4;4. DISCUSSION;139
20.5;REFERENCES;140
21;9 FISHING EXPEDITION - A SUPERVISED APPROACH TO EXTRACT PATTERNS FROM A COMPENDIUM OF EXPRESSION PROFILES;142
21.1;1. OBJECTIVES;143
21.2;2. METHODS;143
21.2.1;2.1 Data Sets;143
21.2.2;2.2 The Algorithms;144
21.2.3;2.3 The approach;146
21.3;3. RESULTS;147
21.4;4. CONCLUSIONS;151
21.5;REFERENCES;152
22;10 MODELING PHARMACOGENOMICS OF THE NCI-60 ANTICANCER DATA SET: UTILIZING KERNEL PLS TO CORRELATE THE MICROARRAY DATA TO THERAPEUTIC RESPONSES;153
22.1;1. INTRODUCTION;154
22.2;2. MOTIVATION;155
22.3;3. METHODOLOGY;156
22.4;4. PERFORMANCE ANALYSES;158
22.5;5. DISCUSSION;167
22.6;ACKNOWLEDGEMENTS;168
22.7;REFERENCES;168
23;11 ANALYSIS OF GENE EXPRESSION PROFILES AND DRUG ACTIVITY PATTERNS BY CLUSTERING AND BAYESIAN NETWORK LEARNING;170
23.1;1. INTRODUCTION;170
23.2;2. CLUSTER ANALYSIS OF THE NCI60 DATASET;171
23.2.1;2.1 Soft Topographic Vector Quantization;172
23.2.2;2.2 Clustering of the NCI60 Cell Lines Using STVQ;172
23.3;3. DEPENDENCY ANALYSIS USING BAYESIAN NETWORK LEARNING;176
23.3.1;3.1 Bayesian Networks;176
23.3.2;3.2 Applying Bayesian Networks to the Analysis of NCI60 Dataset;177
23.3.3;3.3 Experimental Results;178
23.4;4. CONCLUSION AND FUTURE WORK;183
23.5;ACKNOWLEDGEMENTS;184
23.6;REFERENCES;184
24;12 EVALUATION OF CURRENT METHODS OF TESTING DIFFERENTIAL GENE EXPRESSION AND BEYOND;186
24.1;1. INTRODUCTION;187
24.2;2. MATERIALS AND METHODS;187
24.3;3. RESULTS;190
24.4;4. DISCUSSION;193
24.5;ACKNOWLEDGEMENTS;195
24.6;REFERENCES;195
25;13 EXTRACTING KNOWLEDGE FROM GENOMIC EXPERIMENTS BY INCORPORATING THE BIOMEDICAL LITERATURE;196
25.1;1. OBJECTIVE;196
25.2;2. ANALYTICAL METHODS;197
25.2.1;2.1 Data Sets;197
25.2.2;2.2 Software;198
25.3;3. RESULTS;201
25.4;4. DISCUSSION;202
25.4.1;4.1 Title Proximity;203
25.4.2;4.2 Genes Linked to the Disease;204
25.4.3;4.3 Genes That Cannot Be Linked to the Disease;205
25.4.4;4.4 Terms That Cannot Be Linked to Any Other Term;206
25.4.5;4.5 Types of Errors;206
25.4.6;4.6 Other Uses for the "Pharma Sentences";207
25.4.7;4.7 Comparison To Other Tools;208
25.5;5. CONCLUSIONS;209
25.6;REFERENCES;210
26;Glossary;211
27;Index;213
5 EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES (p. 81-82)
Charless Fowlkes1, Qun Shan2, Serge Belongie3, and Jitendra Malik1
Departments of Computer Science1 and Molecular Cell Biology 2, University of California at
Berkeley, Department of Computer Science and Engineering, University of California at San Diego3
Abstract: We have developed a program, GENECUT, for analyzing datasets from gene expression profiling. GENECUT is based on a pairwise clustering method known as Normalized Cut (Shi and Malik, 1997). GENECUT extracts global structures by progressively partitioning datasets into well-balanced groups, performing an intuitive k-way partitioning at each stage in contrast to commonly used 2-way partitioning schemes. By making use of the Nyström approximation, it is possible to perform clustering on very large genomic datasets.
Key words: gene expression profiles, clustering analysis, spectral partitioning
1. INTRODUCTION
DNA microarray technology empowers biologists to analyze thousands of mRNA transcripts in parallel, providing insights about the cellular states of tumor cells, the effect of mutations and knockouts, progression of the cell cycle, and reaction to environmental stresses or drug treatments. Gene expression profiles also provide the necessary raw data to interrogate cellular transcription regulation networks. Efforts have been made in identifying cis acting elements based on the assumption that co-regulated genes have a higher probability of sharing transcription factor binding sites. There is a well-recognized need for tools that allow biologists to explore public domain microarray datasets and integrate insights gained into their own research. One important approach for structuring the exploration of gene expression data is to find coherent clusters of both genes and experimental conditions. The association of unknown genes with functionally well-characterized genes will guide the formation of hypotheses and suggest experiments to uncover the function of these unknown genes. Similarly, experimental conditions that cluster together may affect the same regulatory pathway.
Unsupervised clustering is a classical data analysis problem that is still an active area of intensive research in the computer science and statistics communities (Ripley, 1996). Broadly speaking, the goal of clustering is to partition a set of feature vectors into k groups such that the partition is "good" according to some cost function. In the case of genes, the feature vector is usually the degree of induction or suppression over some set of experimental conditions. As of yet, there is no clear consensus as to which algorithms are most suitable for gene expression data.
Clustering methods generally fall into one of two categories: central or pairwise (Buhmann, 1995). Central clustering is based on the idea of prototypes, wherein one finds a small number of prototypical feature vectors to serve as "cluster centers". Feature vectors are then assigned to the most similar cluster center. Pairwise methods are based directly on the distances between all pairs of feature vectors in the data set. Pairwise methods don’t require one to solve for prototypes, which provides certain advantages over central methods. For example, when the shape of the clusters are not simple, compact clouds in feature space, central methods are ill-suited while pairwise methods perform well since similarity is allowed to propagate in a transitive fashion from neighbor to neighbor. A family of genes related by a series of small mutations might well exhibit this sort of structure, particularly when features are based on sequence data. Clustering algorithms can also often be characterized as greedy or global in nature. The agglomerative clustering method used by Eisen et al. (1998) to order microarray data is an example of a greedy pairwise method: it starts with a full matrix of pairwise distances, locates the smallest value, merges the corresponding pair, and repeats until the whole dataset has been merged into a single cluster. Because this type of process only considers the closest pair of data points at each step, global structure present in the data may not be handled properly.




