E-Book, Englisch, 776 Seiten
Kruschke Doing Bayesian Data Analysis
2. Auflage 2014
ISBN: 978-0-12-405916-0
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
A Tutorial with R, JAGS, and Stan
E-Book, Englisch, 776 Seiten
ISBN: 978-0-12-405916-0
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Second Edition provides an accessible approach for conducting Bayesian data analysis, as material is explained clearly with concrete examples. Included are step-by-step instructions on how to carry out Bayesian data analyses in the popular and free software R and WinBugs, as well as new programs in JAGS and Stan. The new programs are designed to be much easier to use than the scripts in the first edition. In particular, there are now compact high-level scripts that make it easy to run the programs on your own data sets. The book is divided into three parts and begins with the basics: models, probability, Bayes' rule, and the R programming language. The discussion then moves to the fundamentals applied to inferring a binomial probability, before concluding with chapters on the generalized linear model. Topics include metric-predicted variable on one or two groups; metric-predicted variable with one metric predictor; metric-predicted variable with multiple metric predictors; metric-predicted variable with one nominal predictor; and metric-predicted variable with multiple nominal predictors. The exercises found in the text have explicit purposes and guidelines for accomplishment. This book is intended for first-year graduate students or advanced undergraduates in statistics, data analysis, psychology, cognitive science, social sciences, clinical sciences, and consumer sciences in business. - Accessible, including the basics of essential concepts of probability and random sampling - Examples with R programming language and JAGS software - Comprehensive coverage of all scenarios addressed by non-Bayesian textbooks: t-tests, analysis of variance (ANOVA) and comparisons in ANOVA, multiple regression, and chi-square (contingency table analysis) - Coverage of experiment planning - R and JAGS computer programming code on website - Exercises have explicit purposes and guidelines for accomplishment - Provides step-by-step instructions on how to conduct Bayesian data analyses in the popular and free software R and WinBugs
John K. Kruschke is Professor of Psychological and Brain Sciences, and Adjunct Professor of Statistics, at Indiana University in Bloomington, Indiana, USA. He is eight-time winner of Teaching Excellence Recognition Awards from Indiana University. He won the Troland Research Award from the National Academy of Sciences (USA), and the Remak Distinguished Scholar Award from Indiana University. He has been on the editorial boards of various scientific journals, including Psychological Review, the Journal of Experimental Psychology: General, and the Journal of Mathematical Psychology, among others.After attending the Summer Science Program as a high school student and considering a career in astronomy, Kruschke earned a bachelor's degree in mathematics (with high distinction in general scholarship) from the University of California at Berkeley. As an undergraduate, Kruschke taught self-designed tutoring sessions for many math courses at the Student Learning Center. During graduate school he attended the 1988 Connectionist Models Summer School, and earned a doctorate in psychology also from U.C. Berkeley. He joined the faculty of Indiana University in 1989. Professor Kruschke's publications can be found at his Google Scholar page. His current research interests focus on moral psychology.Professor Kruschke taught traditional statistical methods for many years until reaching a point, circa 2003, when he could no longer teach corrections for multiple comparisons with a clear conscience. The perils of p values provoked him to find a better way, and after only several thousand hours of relentless effort, the 1st and 2nd editions of Doing Bayesian Data Analysis emerged.
Autoren/Hrsg.
Weitere Infos & Material
1;Front Cover;1
2;Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan;4
3;Copyright;5
4;Dedication;6
5;Contents;8
6;Chapter 1: What's in This Book (Read This First!);14
6.1;1.1 Real People Can Read This Book;14
6.1.1;1.1.1 Prerequisites;15
6.2;1.2 What's in This Book;16
6.2.1;1.2.1 You're busy. What's the least you can read?;16
6.2.2;1.2.2 You're really busy! Isn't there even less you can read?;17
6.2.3;1.2.3 You want to enjoy the view a little longer. But not too much longer;17
6.2.4;1.2.4 If you just gotta reject a null hypothesis…;18
6.2.5;1.2.5 Where's the equivalent of traditional test X in this book?;18
6.3;1.3 What's New in the Second Edition?;19
6.4;1.4 Gimme Feedback (Be Polite);21
6.5;1.5 Thank You!;21
7;Part I: The Basics: Models, Probability, Bayes' Rule, and R;26
7.1;Chapter 2: Introduction: Credibility, Models, and Parameters;28
7.1.1;2.1 Bayesian Inference Is Reallocation of CredibilityAcross Possibilities;29
7.1.1.1;2.1.1 Data are noisy and inferences are probabilistic;32
7.1.2;2.2 Possibilities Are Parameter Values in Descriptive Models;35
7.1.3;2.3 The Steps of Bayesian Data Analysis;38
7.1.3.1;2.3.1 Data analysis without parametric models?;43
7.1.4;2.4 Exercises;44
7.2;Chapter
3: The R Programming Language;46
7.2.1;3.1 Get the Software;48
7.2.1.1;3.1.1 A look at RStudio;48
7.2.2;3.2 A Simple Example of R in Action;49
7.2.2.1;3.2.1 Get the programs used with this book;51
7.2.3;3.3 Basic Commands and Operators in R;51
7.2.3.1;3.3.1 Getting help in R;52
7.2.3.2;3.3.2 Arithmetic and logical operators;52
7.2.3.3;3.3.3 Assignment, relational operators, and tests of equality;53
7.2.4;3.4 Variable Types;55
7.2.4.1;3.4.1 Vector;55
7.2.4.1.1;3.4.1.1 The combine function;55
7.2.4.1.2;3.4.1.2 Component-by-component vector operations;55
7.2.4.1.3;3.4.1.3 The colon operator and sequence function;56
7.2.4.1.4;3.4.1.4 The replicate function;57
7.2.4.1.5;3.4.1.5 Getting at elements of a vector;58
7.2.4.2;3.4.2 Factor;59
7.2.4.3;3.4.3 Matrix and array;61
7.2.4.4;3.4.4 List and data frame;64
7.2.5;3.5 Loading and Saving Data;66
7.2.5.1;3.5.1 The read.csv and read.table functions;66
7.2.5.2;3.5.2 Saving data from R;68
7.2.6;3.6 Some Utility Functions;69
7.2.7;3.7 Programming in R;74
7.2.7.1;3.7.1 Variable names in R;74
7.2.7.2;3.7.2 Running a program;75
7.2.7.3;3.7.3 Programming a function;77
7.2.7.4;3.7.4 Conditions and loops;78
7.2.7.5;3.7.5 Measuring processing time;79
7.2.7.6;3.7.6 Debugging;80
7.2.8;3.8 Graphical Plots: Opening and Saving;82
7.2.9;3.9 Conclusion;82
7.2.10;3.10 Exercises;83
7.3;Chapter
4: What Is This Stuff Called Probability?;84
7.3.1;4.1 The Set of All Possible Events;85
7.3.1.1;4.1.1 Coin flips: Why you should care;86
7.3.2;4.2 Probability: Outside or Inside the Head;86
7.3.2.1;4.2.1 Outside the head: Long-run relative frequency;87
7.3.2.1.1;4.2.1.1 Simulating a long-run relative frequency;87
7.3.2.1.2;4.2.1.2 Deriving a long-run relative frequency;89
7.3.2.2;4.2.2 Inside the head: Subjective belief;89
7.3.2.2.1;4.2.2.1 Calibrating a subjective belief by preferences;89
7.3.2.2.2;4.2.2.2 Describing a subjective belief mathematically;90
7.3.2.3;4.2.3 Probabilities assign numbers to possibilities;90
7.3.3;4.3 Probability Distributions;91
7.3.3.1;4.3.1 Discrete distributions: Probability mass;91
7.3.3.2;4.3.2 Continuous distributions: Rendezvous with density;93
7.3.3.2.1;4.3.2.1 Properties of probability density functions;95
7.3.3.2.2;4.3.2.2 The normal probability density function;96
7.3.3.3;4.3.3 Mean and variance of a distribution;97
7.3.3.3.1;4.3.3.1 Mean as minimized variance;99
7.3.3.4;4.3.4 Highest density interval (HDI);100
7.3.4;4.4 Two-Way Distributions;102
7.3.4.1;4.4.1 Conditional probability;104
7.3.4.2;4.4.2 Independence of attributes;105
7.3.5;4.5 Appendix: R Code for Figure 4.1;106
7.3.6;4.6 Exercises;108
7.4;Chapter
5: Bayes' Rule;112
7.4.1;5.1 Bayes' Rule;113
7.4.1.1;5.1.1 Derived from definitions of conditional probability;113
7.4.1.2;5.1.2 Bayes' rule intuited from a two-way discrete table;114
7.4.2;5.2 Applied to Parameters and Data;118
7.4.2.1;5.2.1 Data-order invariance;120
7.4.3;5.3 Complete Examples: Estimating Bias in a Coin;121
7.4.3.1;5.3.1 Influence of sample size on the posterior;125
7.4.3.2;5.3.2 Influence of the prior on the posterior;126
7.4.4;5.4 Why Bayesian Inference Can Be Difficult;128
7.4.5;5.5 Appendix: R Code for Figures 5.1, 5.2, etc.;129
7.4.6;5.6 Exercises;131
8;Part
II: All the Fundamentals Applied to Inferring a Binomial Probability;134
8.1;Chapter
6: Inferring a Binomial Probability via Exact Mathematical Analysis;136
8.1.1;6.1 The Likelihood Function: Bernoulli Distribution;137
8.1.2;6.2 A Description of Credibilities: The Beta Distribution;139
8.1.2.1;6.2.1 Specifying a beta prior;140
8.1.3;6.3 The Posterior Beta;145
8.1.3.1;6.3.1 Posterior is compromise of prior and likelihood;146
8.1.4;6.4 Examples;147
8.1.4.1;6.4.1 Prior knowledge expressed as a beta distribution;147
8.1.4.2;6.4.2 Prior knowledge that cannot be expressed as a beta distribution;149
8.1.5;6.5 Summary;151
8.1.6;6.6 Appendix: R Code for Figure 6.4;151
8.1.7;6.7 Exercises;152
8.2;Chapter
7: Markov Chain Monte Carlo;156
8.2.1;7.1 Approximating a Distribution with a Large Sample;158
8.2.2;7.2 A Simple Case of the Metropolis Algorithm;159
8.2.2.1;7.2.1 A politician stumbles upon the Metropolis algorithm;159
8.2.2.2;7.2.2 A random walk;160
8.2.2.3;7.2.3 General properties of a random walk;162
8.2.2.4;7.2.4 Why we care;165
8.2.2.5;7.2.5 Why it works;165
8.2.3;7.3 The Metropolis Algorithm More Generally;169
8.2.3.1;7.3.1 Metropolis algorithm applied to Bernoulli likelihood and beta prior;170
8.2.3.2;7.3.2 Summary of Metropolis algorithm;174
8.2.4;7.4 Toward Gibbs Sampling: Estimating Two Coin Biases;175
8.2.4.1;7.4.1 Prior, likelihood and posterior for two biases;176
8.2.4.2;7.4.2 The posterior via exact formal analysis;178
8.2.4.3;7.4.3 The posterior via the Metropolis algorithm;181
8.2.4.4;7.4.4 Gibbs sampling;183
8.2.4.5;7.4.5 Is there a difference between biases?;189
8.2.4.6;7.4.6 Terminology: MCMC;190
8.2.5;7.5 MCMC Representativeness, Accuracy, and Efficiency;191
8.2.5.1;7.5.1 MCMC representativeness;191
8.2.5.2;7.5.2 MCMC accuracy;195
8.2.5.3;7.5.3 MCMC efficiency;200
8.2.6;7.6 Summary;201
8.2.7;7.7 Exercises;202
8.3;Chapter
8: JAGS;206
8.3.1;8.1 JAGS and its Relation to R;206
8.3.2;8.2 A Complete Example;208
8.3.2.1;8.2.1 Load data;210
8.3.2.2;8.2.2 Specify model;211
8.3.2.3;8.2.3 Initialize chains;213
8.3.2.4;8.2.4 Generate chains;215
8.3.2.5;8.2.5 Examine chains;216
8.3.2.5.1;8.2.5.1 The plotPost function;218
8.3.3;8.3 Simplified Scripts for Frequently Used Analyses;219
8.3.4;8.4 Example: Difference of Biases;221
8.3.5;8.5 Sampling from the Prior Distribution in JAGS;224
8.3.6;8.6 Probability Distributions Available in JAGS;226
8.3.6.1;8.6.1 Defining new likelihood functions;227
8.3.7;8.7 Faster Sampling with Parallel Processing in RunJAGS;228
8.3.8;8.8 Tips for Expanding JAGS Models;231
8.3.9;8.9 Exercises;231
8.4;Chapter
9: Hierarchical Models;234
8.4.1;9.1 A Single Coin from a Single Mint;236
8.4.1.1;9.1.1 Posterior via grid approximation;239
8.4.2;9.2 Multiple Coins from a Single Mint;243
8.4.2.1;9.2.1 Posterior via grid approximation;244
8.4.2.2;9.2.2 A realistic model with MCMC;248
8.4.2.3;9.2.3 Doing it with JAGS;252
8.4.2.4;9.2.4 Example: Therapeutic touch;253
8.4.3;9.3 Shrinkage in Hierarchical Models;258
8.4.4;9.4 Speeding up JAGS;262
8.4.5;9.5 Extending the Hierarchy: Subjects Within Categories;264
8.4.5.1;9.5.1 Example: Baseball batting abilities by position;266
8.4.6;9.6 Exercises;273
8.5;Chapter
10: Model Comparison and Hierarchical Modeling;278
8.5.1;10.1 General Formula and the Bayes Factor;279
8.5.2;10.2 Example: Two Factories of Coins;281
8.5.2.1;10.2.1 Solution by formal analysis;283
8.5.2.2;10.2.2 Solution by grid approximation;284
8.5.3;10.3 Solution by MCMC;287
8.5.3.1;10.3.1 Nonhierarchical MCMC computation of each model'smarginal likelihood;287
8.5.3.1.1;10.3.1.1 Implementation with JAGS;290
8.5.3.2;10.3.2 Hierarchical MCMC computation of relative model probability;291
8.5.3.2.1;10.3.2.1 Using pseudo-priors to reduce autocorrelation;292
8.5.3.3;10.3.3 Models with different "noise" distributions in JAGS;301
8.5.4;10.4 Prediction: Model Averaging;302
8.5.5;10.5 Model Complexity Naturally Accounted for;302
8.5.5.1;10.5.1 Caveats regarding nested model comparison;304
8.5.6;10.6 Extreme Sensitivity to Prior Distribution;305
8.5.6.1;10.6.1 Priors of different models should be equally informed;307
8.5.7;10.7 Exercises;308
8.6;Chapter
11: Null Hypothesis Significance Testing;310
8.6.1;11.1 Paved with Good Intentions;313
8.6.1.1;11.1.1 Definition of p value;313
8.6.1.2;11.1.2 With intention to fix N;315
8.6.1.3;11.1.3 With intention to fix z;318
8.6.1.4;11.1.4 With intention to fix duration;321
8.6.1.5;11.1.5 With intention to make multiple tests;323
8.6.1.6;11.1.6 Soul searching;326
8.6.1.7;11.1.7 Bayesian analysis;327
8.6.2;11.2 Prior Knowledge;328
8.6.2.1;11.2.1 NHST analysis;328
8.6.2.2;11.2.2 Bayesian analysis;328
8.6.2.2.1;11.2.2.1 Priors are overt and relevant;330
8.6.3;11.3 Confidence Interval and Highest Density Interval;330
8.6.3.1;11.3.1 CI depends on intention;331
8.6.3.1.1;11.3.1.1 CI is not a distribution;336
8.6.3.2;11.3.2 Bayesian HDI;337
8.6.4;11.4 Multiple Comparisons;338
8.6.4.1;11.4.1 NHST correction for experimentwise error;338
8.6.4.2;11.4.2 Just one Bayesian posterior no matter how you look at it;341
8.6.4.3;11.4.3 How Bayesian analysis mitigates false alarms;341
8.6.5;11.5 What a Sampling Distribution Is Good For;342
8.6.5.1;11.5.1 Planning an experiment;342
8.6.5.2;11.5.2 Exploring model predictions (posterior predictive check);343
8.6.6;11.6 Exercises;344
8.7;Chapter
12: Bayesian Approaches to Testing a Point ("Null") Hypothesis;348
8.7.1;12.1 The Estimation Approach;349
8.7.1.1;12.1.1 Region of practical equivalence;349
8.7.1.2;12.1.2 Some examples;353
8.7.1.2.1;12.1.2.1 Differences of correlated parameters;353
8.7.1.2.2;12.1.2.2 Why HDI and not equal-tailed interval?;355
8.7.2;12.2 The Model-Comparison Approach;356
8.7.2.1;12.2.1 Is a coin fair or not?;357
8.7.2.1.1;12.2.1.1 Bayes' factor can accept null with poor precision;360
8.7.2.2;12.2.2 Are different groups equal or not?;361
8.7.2.2.1;12.2.2.1 Model specification in JAGS;364
8.7.3;12.3 Relations of Parameter Estimation and Model Comparison;365
8.7.4;12.4 Estimation or Model Comparison?;367
8.7.5;12.5 Exercises;368
8.8;Chapter
13: Goals, Power, and Sample Size;372
8.8.1;13.1 The Will to Power;373
8.8.1.1;13.1.1 Goals and obstacles;373
8.8.1.2;13.1.2 Power;374
8.8.1.3;13.1.3 Sample size;377
8.8.1.4;13.1.4 Other expressions of goals;378
8.8.2;13.2 Computing Power and Sample Size;379
8.8.2.1;13.2.1 When the goal is to exclude a null value;379
8.8.2.2;13.2.2 Formal solution and implementation in R;381
8.8.2.3;13.2.3 When the goal is precision;383
8.8.2.4;13.2.4 Monte Carlo approximation of power;385
8.8.2.5;13.2.5 Power from idealized or actual data;389
8.8.3;13.3 Sequential Testing and the Goal of Precision;396
8.8.3.1;13.3.1 Examples of sequential tests;398
8.8.3.2;13.3.2 Average behavior of sequential tests;401
8.8.4;13.4 Discussion;406
8.8.4.1;13.4.1 Power and multiple comparisons;406
8.8.4.2;13.4.2 Power: prospective, retrospective, and replication;406
8.8.4.3;13.4.3 Power analysis requires verisimilitude of simulated data;407
8.8.4.4;13.4.4 The importance of planning;408
8.8.5;13.5 Exercises;409
8.9;Chapter
14: Stan;412
8.9.1;14.1 HMC Sampling;413
8.9.2;14.2 Installing Stan;420
8.9.3;14.3 A Complete Example;420
8.9.3.1;14.3.1 Reusing the compiled model;423
8.9.3.2;14.3.2 General structure of Stan model specification;423
8.9.3.3;14.3.3 Think log probability to think like Stan;424
8.9.3.4;14.3.4 Sampling the prior in Stan;425
8.9.3.5;14.3.5 Simplified scripts for frequently used analyses;426
8.9.4;14.4 Specify Models Top-Down in Stan;427
8.9.5;14.5 Limitations and Extras;428
8.9.6;14.6 Exercises;428
9;Part
III: The Generalized Linear Model;430
9.1;Chapter
15: Overview of the Generalized Linear Model;432
9.1.1;15.1 Types of Variables;433
9.1.1.1;15.1.1 Predictor and predicted variables;433
9.1.1.2;15.1.2 Scale types: metric, ordinal, nominal, and count;434
9.1.2;15.2 Linear Combination of Predictors;436
9.1.2.1;15.2.1 Linear function of a single metric predictor;436
9.1.2.2;15.2.2 Additive combination of metric predictors;438
9.1.2.3;15.2.3 Nonadditive interaction of metric predictors;440
9.1.2.4;15.2.4 Nominal predictors;442
9.1.2.4.1;15.2.4.1 Linear model for a single nominal predictor;442
9.1.2.4.2;15.2.4.2 Additive combination of nominal predictors;443
9.1.2.4.3;15.2.4.3 Nonadditive interaction of nominal predictors;445
9.1.3;15.3 Linking from Combined Predictors to Noisy Predicted data;448
9.1.3.1;15.3.1 From predictors to predicted central tendency;448
9.1.3.1.1;15.3.1.1 The logistic function;449
9.1.3.1.2;15.3.1.2 The cumulative normal function;452
9.1.3.2;15.3.2 From predicted central tendency to noisy data;453
9.1.4;15.4 Formal Expression of the GLM;457
9.1.4.1;15.4.1 Cases of the GLM;457
9.1.5;15.5 Exercises;459
9.2;Chapter
16: Metric-Predicted Variable on One or Two Groups;462
9.2.1;16.1 Estimating the Mean and Standard Deviationof a Normal Distribution;463
9.2.1.1;16.1.1 Solution by mathematical analysis;464
9.2.1.2;16.1.2 Approximation by MCMC in JAGS;468
9.2.2;16.2 Outliers and Robust Estimation: The t Distribution;471
9.2.2.1;16.2.1 Using the t distribution in JAGS;475
9.2.2.2;16.2.2 Using the t distribution in Stan;477
9.2.3;16.3 Two Groups;481
9.2.3.1;16.3.1 Analysis by NHST;483
9.2.4;16.4 Other Noise Distributions and Transforming Data;485
9.2.5;16.5 Exercises;486
9.3;Chapter
17: Metric Predicted Variable with One Metric Predictor;490
9.3.1;17.1 Simple Linear Regression;491
9.3.2;17.2 Robust Linear Regression;492
9.3.2.1;17.2.1 Robust linear regression in JAGS;496
9.3.2.1.1;17.2.1.1 Standardizing the data for MCMC sampling;497
9.3.2.2;17.2.2 Robust linear regression in Stan;500
9.3.2.2.1;17.2.2.1 Constants for vague priors;500
9.3.2.3;17.2.3 Stan or JAGS?;501
9.3.2.4;17.2.4 Interpreting the posterior distribution;502
9.3.3;17.3 Hierarchical Regression on Individuals Within Groups;503
9.3.3.1;17.3.1 The model and implementation in JAGS;504
9.3.3.2;17.3.2 The posterior distribution: Shrinkage and prediction;508
9.3.4;17.4 Quadratic Trend and Weighted Data;508
9.3.4.1;17.4.1 Results and interpretation;512
9.3.4.2;17.4.2 Further extensions;513
9.3.5;17.5 Procedure and Perils for Expanding a Model;514
9.3.5.1;17.5.1 Posterior predictive check;514
9.3.5.2;17.5.2 Steps to extend a JAGS or Stan model;515
9.3.5.3;17.5.3 Perils of adding parameters;516
9.3.6;17.6 Exercises;517
9.4;Chapter
18: Metric Predicted Variable with Multiple Metric Predictors;522
9.4.1;18.1 Multiple Linear Regression;523
9.4.1.1;18.1.1 The perils of correlated predictors;523
9.4.1.2;18.1.2 The model and implementation;527
9.4.1.3;18.1.3 The posterior distribution;530
9.4.1.4;18.1.4 Redundant predictors;532
9.4.1.5;18.1.5 Informative priors, sparse data, and correlated predictors;536
9.4.2;18.2 Multiplicative Interaction of Metric Predictors;538
9.4.2.1;18.2.1 An example;540
9.4.3;18.3 Shrinkage of Regression Coefficients;543
9.4.4;18.4 Variable Selection;549
9.4.4.1;18.4.1 Inclusion probability is strongly affected by vagueness of prior;552
9.4.4.2;18.4.2 Variable selection with hierarchical shrinkage;555
9.4.4.3;18.4.3 What to report and what to conclude;557
9.4.4.4;18.4.4 Caution: Computational methods;560
9.4.4.5;18.4.5 Caution: Interaction variables;561
9.4.5;18.5 Exercises;562
9.5;Chapter
19: Metric Predicted Variable with One Nominal Predictor;566
9.5.1;19.1 Describing Multiple Groups of Metric Data;567
9.5.2;19.2 Traditional Analysis of Variance;569
9.5.3;19.3 Hierarchical Bayesian Approach;570
9.5.3.1;19.3.1 Implementation in JAGS;573
9.5.3.2;19.3.2 Example: Sex and death;574
9.5.3.3;19.3.3 Contrasts;578
9.5.3.4;19.3.4 Multiple comparisons and shrinkage;580
9.5.3.5;19.3.5 The two-group case;581
9.5.4;19.4 Including a Metric Predictor;581
9.5.4.1;19.4.1 Example: Sex, death, and size;584
9.5.4.2;19.4.2 Analogous to traditional ANCOVA;584
9.5.4.3;19.4.3 Relation to hierarchical linear regression;586
9.5.5;19.5 Heterogeneous variances and robustnessagainst outliers;586
9.5.5.1;19.5.1 Example: Contrast of means with different variances;588
9.5.6;19.6 Exercises;592
9.6;Chapter
20: Metric Predicted Variable with Multiple Nominal Predictors;596
9.6.1;20.1 Describing Groups of Metric Data with MultipleNominal Predictors;597
9.6.1.1;20.1.1 Interaction;598
9.6.1.2;20.1.2 Traditional ANOVA;600
9.6.2;20.2 Hierarchical Bayesian Approach;601
9.6.2.1;20.2.1 Implementation in JAGS;602
9.6.2.2;20.2.2 Example: It's only money;603
9.6.2.3;20.2.3 Main effect contrasts;608
9.6.2.4;20.2.4 Interaction contrasts and simple effects;610
9.6.2.4.1;20.2.4.1 Interaction effects: High uncertainty and shrinkage;611
9.6.3;20.3 Rescaling can Change Interactions, Homogeneity,and Normality;612
9.6.4;20.4 Heterogeneous Variances and RobustnessAgainst Outliers;615
9.6.5;20.5 Within-Subject Designs;619
9.6.5.1;20.5.1 Why use a within-subject design? And why not?;621
9.6.5.2;20.5.2 Split-plot design;623
9.6.5.2.1;20.5.2.1 Example: Knee high by the fourth of July;624
9.6.5.2.2;20.5.2.2 The descriptive model;625
9.6.5.2.3;20.5.2.3 Implementation in JAGS;627
9.6.5.2.4;20.5.2.4 Results;627
9.6.6;20.6 Model Comparison Approach;629
9.6.7;20.7 Exercises;631
9.7;Chapter
21: Dichotomous Predicted Variable;634
9.7.1;21.1 Multiple Metric Predictors;635
9.7.1.1;21.1.1 The model and implementation in JAGS;635
9.7.1.2;21.1.2 Example: Height, weight, and gender;639
9.7.2;21.2 Interpreting the Regression Coefficients;642
9.7.2.1;21.2.1 Log odds;642
9.7.2.2;21.2.2 When there are few 1's or 0's in the data;644
9.7.2.3;21.2.3 Correlated predictors;645
9.7.2.4;21.2.4 Interaction of metric predictors;646
9.7.3;21.3 Robust Logistic Regression;648
9.7.4;21.4 Nominal Predictors;649
9.7.4.1;21.4.1 Single group;651
9.7.4.2;21.4.2 Multiple groups;654
9.7.4.2.1;21.4.2.1 Example: Baseball again;654
9.7.4.2.2;21.4.2.2 The model;655
9.7.4.2.3;21.4.2.3 Results;657
9.7.5;21.5 Exercises;659
9.8;Chapter
22: Nominal Predicted Variable;662
9.8.1;22.1 Softmax Regression;663
9.8.1.1;22.1.1 Softmax reduces to logistic for two outcomes;666
9.8.1.2;22.1.2 Independence from irrelevant attributes;667
9.8.2;22.2 Conditional logistic regression;668
9.8.3;22.3 Implementation in JAGS;672
9.8.3.1;22.3.1 Softmax model;672
9.8.3.2;22.3.2 Conditional logistic model;674
9.8.3.3;22.3.3 Results: Interpreting the regression coefficients;675
9.8.3.3.1;22.3.3.1 Softmax model;675
9.8.3.3.2;22.3.3.2 Conditional logistic model;677
9.8.4;22.4 Generalizations and Variations of the Models;680
9.8.5;22.5 Exercises;681
9.9;Chapter
23: Ordinal Predicted Variable;684
9.9.1;23.1 Modeling Ordinal Data with an UnderlyingMetric Variable;685
9.9.2;23.2 The Case of a Single Group;688
9.9.2.1;23.2.1 Implementation in JAGS;689
9.9.2.2;23.2.2 Examples: Bayesian estimation recovers true parameter values;690
9.9.2.2.1;23.2.2.1 Not the same results as pretending the data are metric;693
9.9.2.2.2;23.2.2.2 Ordinal outcomes versus Likert scales;694
9.9.3;23.3 The Case of Two Groups;695
9.9.3.1;23.3.1 Implementation in JAGS;696
9.9.3.2;23.3.2 Examples: Not funny;696
9.9.4;23.4 The Case of Metric Predictors;698
9.9.4.1;23.4.1 Implementation in JAGS;701
9.9.4.2;23.4.2 Example: Happiness and money;702
9.9.4.3;23.4.3 Example: Movies—They don't make 'em like they used to;706
9.9.4.4;23.4.4 Why are some thresholds outside the data?;708
9.9.5;23.5 Posterior Prediction;711
9.9.6;23.6 Generalizations and Extensions;712
9.9.7;23.7 Exercises;713
9.10;Chapter
24: Count Predicted Variable;716
9.10.1;24.1 Poisson Exponential Model;717
9.10.1.1;24.1.1 Data structure;717
9.10.1.2;24.1.2 Exponential link function;718
9.10.1.3;24.1.3 Poisson noise distribution;720
9.10.1.4;24.1.4 The complete model and implementation in JAGS;721
9.10.2;24.2 Example: Hair Eye Go Again;724
9.10.3;24.3 Example: Interaction Contrasts, Shrinkage,and Omnibus Test;726
9.10.4;24.4 Log-Linear Models for Contingency Tables;728
9.10.5;24.5 Exercises;728
9.11;Chapter
25: Tools in the Trunk;734
9.11.1;25.1 Reporting a Bayesian analysis;734
9.11.1.1;25.1.1 Essential points;735
9.11.1.2;25.1.2 Optional points;737
9.11.1.3;25.1.3 Helpful points;737
9.11.2;25.2 Functions for Computing Highest Density Intervals;738
9.11.2.1;25.2.1 R code for computing HDI of a grid approximation;738
9.11.2.2;25.2.2 HDI of unimodal distribution is shortest interval;739
9.11.2.3;25.2.3 R code for computing HDI of a MCMC sample;740
9.11.2.4;25.2.4 R code for computing HDI of a function;741
9.11.3;25.3 Reparameterization;742
9.11.3.1;25.3.1 Examples;743
9.11.3.2;25.3.2 Reparameterization of two parameters;743
9.11.4;25.4 Censored Data in JAGS;745
9.11.5;25.5 What Next?;749
10;Bibliography;750
11;Index;760
Chapter 2 Introduction
Credibility, Models, and Parameters
Contents 2.1 Bayesian Inference Is Reallocation of Credibility Across Possibilities 16 2.1.1 Data are noisy and inferences are probabilistic 19 2.2 Possibilities Are Parameter Values in Descriptive Models 22 2.3 The Steps of Bayesian Data Analysis 25 2.3.1 Data analysis without parametric models? 30 2.4 Exercises 31 I just want someone who I can believe in, Someone at home who will not leave me grievin'. Show me a sign that you'll always be true, and I'll be your model of faith and virtue.1 The goal of this chapter is to introduce the conceptual framework of Bayesian data analysis. Bayesian data analysis has two foundational ideas. The first idea is that Bayesian inference is reallocation of credibility across possibilities. The second foundational idea is that the possibilities, over which we allocate credibility, are parameter values in meaningful mathematical models. These two fundamental ideas form the conceptual foundation for every analysis in this book. Simple examples of these ideas are presented in this chapter. The rest of the book merely fills in the mathematical and computational details for specific applications of these two ideas. This chapter also explains the basic procedural steps shared by every Bayesian analysis. 2.1 Bayesian inference is reallocation of credibility across possibilities
Suppose we step outside one morning and notice that the sidewalk is wet, and wonder why. We consider all possible causes of the wetness, including possibilities such as recent rain, recent garden irrigation, a newly erupted underground spring, a broken sewage pipe, a passerby who spilled a drink, and so on. If all we know until this point is that some part of the sidewalk is wet, then all those possibilities will have some prior credibility based on previous knowledge. For example, recent rain may have greater prior probability than a spilled drink from a passerby. Continuing on our outside journey, we look around and collect new observations. If we observe that the sidewalk is wet for as far as we can see, as are the trees and parked cars, then we re-allocate credibility to the hypothetical cause of recent rain. The other possible causes, such as a passerby spilling a drink, would not account for the new observations. On the other hand, if instead we observed that the wetness was localized to a small area, and there was an empty drink cup a few feet away, then we would re-allocate credibility to the spilled-drink hypothesis, even though it had relatively low prior probability. This sort of reallocation of credibility across possibilities is the essence of Bayesian inference. Another example of Bayesian inference has been immortalized in the words of the fictional detective Sherlock Holmes, who often said to his sidekick, Doctor Watson: “How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?” (Doyle, 1890, chap. 6) Although this reasoning was not described by Holmes or Watson or Doyle as Bayesian inference, it is. Holmes conceived of a set of possible causes for a crime. Some of the possibilities may have seemed very improbable, a priori. Holmes systematically gathered evidence that ruled out a number of the possible causes. If all possible causes but one were eliminated, then (Bayesian) reasoning forced him to conclude that the remaining possible cause was fully credible, even if it seemed improbable at the start. Figure 2.1 illustrates Holmes' reasoning. For the purposes of illustration, we suppose that there are just four possible causes of the outcome to be explained. We label the causes A, B, C, and D. The heights of the bars in the graphs indicate the credibility of the candidate causes. (“Credibility” is synonymous with “probability”; here I use the everyday term “credibility” but later in the book, when mathematical formalisms are introduced, I will also use the term “probability.”) Credibility can range from zero to one. If the credibility of a candidate cause is zero, then the cause is definitely not responsible. If the credibility of a candidate cause is one, then the cause definitely is responsible. Because we assume that the candidate causes are mutually exclusive and exhaust all possible causes, the total credibility across causes sums to one. Figure 2.1 The upper-left graph shows the credibilities of the four possible causes for an outcome. The causes, labeled A, B, C, and D, are mutually exclusive and exhaust all possibilities. The causes happen to be equally credible at the outset; hence all have prior credibility of 0.25. The lower-left graph shows the credibilities when one cause is learned to be impossible. The resulting posterior distribution is used as the prior distribution in the middle column, where another cause is learned to be impossible. The posterior distribution from the middle column is used as the prior distribution for the right column. The remaining possible cause is fully implicated by Bayesian reallocation of credibility. The upper-left panel of Figure 2.1 shows that the prior credibilities of the four candidate causes are equal, all at 0.25. Unlike the case of the wet sidewalk, in which prior knowledge suggested that rain may be a more likely cause than a newly erupted underground spring, the present illustration assumes equal prior credibilities of the candidate causes. Suppose we make new observations that rule out candidate cause A. For example, if A is a suspect in a crime, we may learn that A was far from the crime scene at the time. Therefore, we must re-allocate credibility to the remaining candidate causes, B through D, as shown in the lower-left panel of Figure 2.1. The re-allocated distribution of credibility is called the posterior distribution because it is what we believe after taking into account the new observations. The posterior distribution gives zero credibility to cause A, and allocates credibilities of 0.33 (i.e., 1/3) to candidate causes B, C, and D. The posterior distribution then becomes the prior beliefs for subsequent observations. Thus, the prior distribution in the upper-middle of Figure 2.1 is the posterior distribution from the lower left. Suppose now that additional new evidence rules out candidate cause B. We now must re-allocate credibility to the remaining candidate causes, C and D, as shown in the lower-middle panel of Figure 2.1. This posterior distribution becomes the prior distribution for subsequent data collection, as shown in the upper-right panel of Figure 2.1. Finally, if new data rule out candidate cause C, then all credibility must fall on the remaining cause, D, as shown in the lower-right panel of Figure 2.1, just as Holmes declared. This reallocation of credibility is not only intuitive, it is also what the exact mathematics of Bayesian inference prescribe, as will be explained later in the book. The complementary form of reasoning is also Bayesian, and can be called judicial exoneration. Suppose there are several possible culprits for a crime, and that these suspects are mutually unaffiliated and exhaust all possibilities. If evidence accrues that one suspect is definitely culpable, then the other suspects are exonerated. This form of exoneration is illustrated in Figure 2.2. The upper panel assumes that there are four possible causes for an outcome, labeled A, B, C, and D. We assume that the causes are mutually exclusive and exhaust all possibilities. In the context of suspects for a crime, the credibility of the hypothesis that suspect A committed the crime is the culpability of the suspect. So it might be easier in this context to think of culpability instead of credibility. The prior culpabilities of the four suspects are, for this illustration, set to be equal, so the four bars in the upper panel of Figure 2.2 are all of height 0.25. Suppose that new evidence firmly implicates suspect D as the culprit. Because the other suspects are known to be unaffiliated, they are exonerated, as shown in the lower panel of Figure 2.2. As in the situation of Holmesian deduction, this exoneration is not only intuitive, it is also what the exact mathematics of Bayesian inference prescribe, as will be explained later in the book. Figure 2.2 The upper graph shows the credibilities of the four possible causes for an outcome. The causes, labeled A, B, C and D, are mutually exclusive and exhaust all possibilities. The causes happen to be equally credible at the outset, hence all have prior credibility of 0.25. The lower graph shows the credibilities when one cause is learned to be responsible. The nonresponsible causes are “exonerated” (i.e., have zero credibility as causes) by Bayesian reallocation of credibility. 2.1.1 Data are noisy and inferences are probabilistic
The cases of Figures 2.1 and 2.2 assumed that observed data...