E-Book, Englisch, 265 Seiten, eBook
Garzon / Yang / Venugopal Dimensionality Reduction in Data Science
1. Auflage 2022
ISBN: 978-3-031-05371-9
Verlag: Springer International Publishing
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 265 Seiten, eBook
ISBN: 978-3-031-05371-9
Verlag: Springer International Publishing
Format: PDF
Kopierschutz: 1 - PDF Watermark
The ability to generate, gather and store volumes of data in the order of tera- and exo bytes daily has far outpaced our ability to derive useful information with available computational resources for many domains.
This book focuses on data science and problem definition, data cleansing, feature selection and extraction, statistical, geometric, information-theoretic, biomolecular and machine learning methods for dimensionality reduction of big datasets and problem solving, as well as a comparative assessment of solutions in a real-world setting.
This book targets professionals working within related fields with an undergraduate degree in any science area, particularly quantitative. Readers should be able to follow examples in this book that introduce each method or technique. These motivating examples are followed by precise definitions of the technical concepts required and presentation of the results in general situations. These concepts require a degree of abstraction that can be followed by re-interpreting concepts like in the original example(s). Finally, each section closes with solutions to the original problem(s) afforded by these techniques, perhaps in various ways to compare and contrast dis/advantages to other solutions.
Zielgruppe
Professional/practitioner
Autoren/Hrsg.
Weitere Infos & Material
1. What is Data Science (DS)?1.1 Major Families of Data Science Problems1.1.1 Classification Problems1.1.2 Prediction Problems1.1.3 Clustering Problems1.2 Data, Big Data and Pre-processing1.2.1 What is Data?1.2.2 Big data1.2.3 Data Cleansing1.2.4 Data Visualization1.2.5 Data Understanding1.3 Populations and Data Sampling1.3.1 Sampling1.3.2 Training, Testing and Validation1.4 Overview and Scope1.4.1 Prerequisites and Layout1.4.2 Data Science Methodology1.4.3 Scope of the Book2. Solutions to Data Science Problems2.1 Conventional Statistical Solutions2.1.1 Linear Multiple Regression Model: Continuous Response2.1.2 Logistic Regression: Categorical Response2.1.3 Variable Selection and Model Building2.1.4 Generalized Linear Model (GLM)2.1.5 Decision Trees2.1.6 Bayesian Learning2.2 Machine Learning Solutions: Supervised2.2.1 k-Nearest Neighbors (kNN)2.2.2 Ensemble Methods2.2.3 Support Vector Machines (SVMs)2.2.4 Neural Networks (NNs)2.3 Machine Learning Solutions: Unsupervised2.3.1 Hard Clustering2.3.2 Soft Clustering2.4 Controls, Evaluation and Assessment2.4.1 Evaluation Methods2.4.2 Metrics for Assessment3. What is Dimensionality Reduction (DR)?3.1 Dimensionality Reduction3.2 Major Approaches to Dimensionality Reduction3.2.1 Conventional Statistical Approaches3.2.2 Geometric Approaches3.2.3 Information-theoretic Approaches3.2.4 Molecular Computing Approaches3.3 The Blessings of Dimensionality4. Conventional Statistical Approaches4.1 Principal Component Analysis (PCA)4.1.1 Obtaining the Principal Components4.1.2 Singular value decomposition (SVD)4.2 Nonlinear PCA 4.2.1 Kernel PCA4.2.2 Independent component analysis (ICA)4.3 Nonnegative Matrix Factorization (NMF)4.3.1 Approximate Solutions4.3.2 Clustering and Other Applications4.4 Discriminant Analysis4.4.1 Linear discriminant analysis (LDA)4.4.2 Quadratic discriminant analysis (QDA)4.5 Sliced Inverse Regression (SIR)5. Geometric Approaches5.1 Introduction to Manifolds5.2 Manifold Learning Methods5.2.1 Multi-Dimensional Scaling (MDS)5.2.2 Isometric Mapping (ISOMAP)5.2.3 t-Stochastic Neighbor Embedding ( t-SNE )5.3 Exploiting Randomness (RND)6. Information-theoretic Approaches6.1 Shannon Entropy (H)6.2 Reduction by Conditional Entropy6.3 Reduction by Iterated Conditional Entropy6.4 Reduction by Conditional Entropy on Targets6.5 Other Variations7. Molecular Computing Approaches7.1 Encoding Abiotic Data into DNA7.2 Deep Structure of DNA Spaces7.2.1 Structural Properties of DNA Spaces7.2.2 Noncrosshybridizing (nxh) Bases7.3 Reduction by Genomic Signatures7.3.1 Background7.3.2 Genomic Signatures7.4 Reduction by Pmeric Signatures8. Statistical Learning Approaches8.1 Reduction by Multiple Regression8.2 Reduction by Ridge Regression8.3 Reduction by Lasso Regression 8.4 Selection versus Shrinkage8.5 Further refinements9. Machine Learning Approaches9.1 Autoassociative Feature Encoders9.1.1 Undercomplete Autoencoders 9.1.2 Sparse Autoencoders9.1.3 Variational Autoencoders9.1.4 Dimensionality Reduction in MNIST Images9.2 Neural Feature Selection9.2.1 Facial Features, Expressions and Displays9.2.2 The Cohn-Kanade Dataset9.2.3 Primary and Derived Features9.3 Other Methods10. Metaheuristics of DR Methods10.1 Exploiting Feature Grouping10.2 Exploiting Domain Knowledge10.2.1 What is Domain Knowledge?10.2.2 Domain Knowledge for Dimensionality Reduction10.3 Heuristic Rules for Feature Selection, Extraction and Number10.4 About Explainability of Solutions10.4.1 What is Explainability?10.4.2 Explainability in Dimensionality Reduction10.5 Choosing Wisely10.6 About the Curse of Dimensionality10.7 About the No-Free-Lunch Theorem (NFL)11. Appendices11.1 Statistics and Probability Background11.1.1 Commonly Used Discrete Distributions11.1.2 Commonly Used Continuous Distributions11.1.3 Major Results In Probability and Statistics11.2 Linear Algebra Background11.2.1 Fields, Vector Spaces and Subspaces11.2.2 Linear independence, Bases and Dimension11.2.3 Linear Transformations and Matrices11.2.4 Eigenvalues and Spectral Decomposition11.3 Computer Science Background11.3.1 Computational Science and Complexity11.3.2 Machine Learning11.4 Typical Data Science Problems11.5 A Sample of Common and Big Datasets11.6 Computing Platforms11.6.1 The Environment R11.6.2 Python environmentsReferences




