E-Book, Englisch, 549 Seiten
Reihe: Statistics and Computing
Muenchen / Hilbe R for Stata Users
1. Auflage 2010
ISBN: 978-1-4419-1318-0
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 549 Seiten
Reihe: Statistics and Computing
ISBN: 978-1-4419-1318-0
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Stata is the most flexible and extensible data analysis package available from a commercial vendor. R is a similarly flexible free and open source package for data analysis, with over 3,000 add-on packages available. This book shows you how to extend the power of Stata through the use of R. It introduces R using Stata terminology with which you are already familiar. It steps through more than 30 programs written in both languages, comparing and contrasting the two packages' different approaches. When finished, you will be able to use R in conjunction with Stata, or separately, to import data, manage and transform it, create publication quality graphics, and perform basic statistical analyses. A glossary defines over 50 R terms using Stata jargon and again using more formal R terminology. The table of contents and index allow you to find equivalent R functions by looking up Stata commands and vice versa. The example programs and practice datasets for both R and Stata are available for download.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;6
2;Contents;10
3;List of Tables;20
4;List of Figures;22
5;1 Introduction;26
5.1;1.1 Overview;26
5.2;1.2 Similarities Between R and Stata;27
5.3;1.3 Why Learn R?;28
5.4;1.4 Is R Accurate?;29
5.5;1.5 What About Tech Support?;29
5.6;1.6 Getting Started Quickly;30
5.7;1.7 Programming Conventions;30
5.8;1.8 Typographic Conventions;31
6;2 Installing and Updating R;33
6.1;2.1 Installing Add-on Packages;34
6.2;2.2 Loading an Add-on Package;34
6.3;2.3 Updating Your Installation;38
6.4;2.4 Uninstalling R;39
6.5;2.5 Choosing Repositories;39
6.6;2.6 Accessing Data in Packages;41
7;3 Running R;43
7.1;3.1 Running R Interactively on Windows;43
7.2;3.2 Running R Interactively on Macintosh;45
7.3;3.3 Running R Interactively on Linux or UNIX;47
7.4;3.4 Running Programs That Include Other Programs;49
7.5;3.5 Running R in Batch Mode;49
7.6;3.6 Graphical User Interfaces;50
7.6.1;3.6.1 R Commander;50
7.6.2;3.6.2 Rattle for Data Mining;53
7.6.3;3.6.3 JGR Java GUI for R;54
8;4 Help and Documentation;60
8.1;4.1 Introduction;60
8.2;4.2 Help Files;60
8.3;4.3 Starting Help;60
8.4;4.4 Help Examples;62
8.5;4.5 Help for Functions That Call Other Functions;63
8.6;4.6 Help for Packages;64
8.7;4.7 Help for Data Sets;65
8.8;4.8 Books and Manuals;65
8.9;4.9 E-mail Lists;65
8.10;4.10 Searching the Web;66
8.11;4.11 Vignettes;66
9;5 Programming Language Basics;68
9.1;5.1 Introduction;68
9.2;5.2 Simple Calculations;69
9.3;5.3 Data Structures;70
9.3.1;5.3.1 Vectors;70
9.3.2;5.3.2 Factors;74
9.3.3;5.3.3 Data Frames;79
9.3.4;5.3.4 Matrices;83
9.3.5;5.3.5 Arrays;86
9.3.6;5.3.6 Lists;86
9.4;5.4 Saving Your Work;90
9.5;5.5 Comments to Document Your Programs;92
9.6;5.6 Controlling Functions (Commands);93
9.6.1;5.6.1 Controlling Functions with Arguments;93
9.6.2;5.6.2 Controlling Functions with Formulas;95
9.6.3;5.6.3 Controlling Functions with an Object's Class;96
9.6.4;5.6.4 Controlling Functions with Extractor Functions;98
9.7;5.7 How Much Output is There?;100
9.8;5.8 Writing Your Own Functions (Macros);104
9.9;5.9 R Program Demonstrating Programming Basics;107
10;6 Data Acquisition;114
10.1;6.1 The R Data Editor;114
10.2;6.2 Reading Delimited Text Files;116
10.2.1;6.2.1 Reading Comma-Delimited Text Files;117
10.2.2;6.2.2 Reading Tab-Delimited Text Files;118
10.2.3;6.2.3 Missing Values for Character Variables;120
10.2.4;6.2.4 Trouble with Tabs;121
10.2.5;6.2.5 Skipping Variables in Delimited Files;122
10.2.6;6.2.6 Example Programs for Reading Delimited TextFiles;123
10.3;6.3 Reading Text Data Within a Program;125
10.3.1;6.3.1 The Easy Approach;125
10.3.2;6.3.2 The More General Approach;127
10.3.3;6.3.3 Example Programs for Reading Text Data Within a Program;127
10.4;6.4 Reading Fixed-Width Text Files, One Record per Case;129
10.4.1;6.4.1 Macro Substitution;132
10.4.2;6.4.2 Example Programs for Reading Fixed-Width Text Files, One Record Per Case;133
10.5;6.5 Reading Fixed-Width Text Files, Two or More Records per Case;134
10.5.1;6.5.1 Example Programs to Read Fixed-Width Text Files with Two Records per Case;135
10.6;6.6 Importing Data from Stata into R;136
10.6.1;6.6.1 R Program to Import Data from Stata;137
10.7;6.7 Writing Data to a Comma-Delimited Text File;137
10.7.1;6.7.1 Example Programs for Writing a Comma-Delimited File;138
10.8;6.8 Exporting Data from R to Stata;139
11;7 Selecting Variables;141
11.1;7.1 Selecting Variables in Stata;141
11.2;7.2 Selecting All Variables;142
11.3;7.3 Selecting Variables Using Index Numbers;142
11.4;7.4 Selecting Variables Using Column Names;145
11.5;7.5 Selecting Variables Using Logic;146
11.6;7.6 Selecting Variables Using String Search;148
11.7;7.7 Selecting Variables Using $ Notation;150
11.8;7.8 Selecting Variables Using Component Names;151
11.8.1;7.8.1 The attach Function;151
11.8.2;7.8.2 The with Function;152
11.8.3;7.8.3 Using Component Names in Formulas;152
11.9;7.9 Selecting Variables with the subset Function;153
11.10;7.10 Selecting Variables Using List Index;154
11.11;7.11 Generating Indexes A to Z from Two Variable Names;154
11.12;7.12 Saving Selected Variables to a New Dataset;155
11.13;7.13 Example Programs for Variable Selection;156
11.13.1;7.13.1 Stata Program to Select Variables;156
11.13.2;7.13.2 R Program to Select Variables;156
12;8 Selecting Observations;161
12.1;8.1 Selecting Observations in Stata;161
12.2;8.2 Selecting All Observations;162
12.3;8.3 Selecting Observations Using Index Numbers;162
12.4;8.4 Selecting Observations Using Row Names;165
12.5;8.5 Selecting Observations Using Logic;167
12.6;8.6 Selecting Observations Using String Search;170
12.7;8.7 Selecting Observations Using the subset Function;172
12.8;8.8 Generating Indexes A to Z from Two Row Names;173
12.9;8.9 Variable Selection Methods with No Counterpart for Selecting Observations;174
12.10;8.10 Saving Selected Observations to a New Data Frame;174
12.11;8.11 Example Programs for Selecting Observations;174
12.11.1;8.11.1 Stata Program to Select Observations;175
12.11.2;8.11.2 R Program to Select Observations;175
13;9 Selecting Variables and Observations;179
13.1;9.1 The subset Function;179
13.2;9.2 Selecting Observations by Logic and Variables by Name;180
13.3;9.3 Using Names to Select Both Observations and Variables;181
13.4;9.4 Using Numeric Index Values to Select Both Observations and Variables;182
13.5;9.5 Using Logic to Select Both Observations and Variables;183
13.6;9.6 Saving and Loading Subsets;184
13.7;9.7 Example Programs for Selecting Variables and Observations;184
13.7.1;9.7.1 Stata Program for Selecting Variables and Observations;184
13.7.2;9.7.2 R Program for Selecting Variables and Observations;185
14;10 Data Management;189
14.1;10.1 Transforming Variables;189
14.1.1;10.1.1 Example Programs for Transforming Variables;193
14.2;10.2 Functions or Commands? The apply Function Decides;194
14.2.1;10.2.1 Applying the mean Function;195
14.2.2;10.2.2 Finding N or NVALID;198
14.2.3;10.2.3 Example Programs for Applying StatisticalFunctions;200
14.3;10.3 Conditional Transformations ;202
14.3.1;10.3.1 Example Programs for ConditionalTransformations;204
14.4;10.4 Multiple Conditional Transformations;205
14.4.1;10.4.1 Example Programs for Multiple Conditional Transformations;207
14.5;10.5 Missing Values;208
14.5.1;10.5.1 Substituting Means for Missing Values;210
14.5.2;10.5.2 Finding Complete Observations;211
14.5.3;10.5.3 When ``99'' Has Meaning;212
14.5.4;10.5.4 Example Programs to Assign Missing Values;214
14.6;10.6 Renaming Variables (and Observations);216
14.6.1;10.6.1 Renaming Variables---Advanced Examples;218
14.6.2;10.6.2 Renaming by Index;219
14.6.3;10.6.3 Renaming by Column Name;220
14.6.4;10.6.4 Renaming Many Sequentially Numbered Variable Names;221
14.6.5;10.6.5 Renaming Observations;222
14.6.6;10.6.6 Example Programs for Renaming Variables;222
14.7;10.7 Recoding Variables;226
14.7.1;10.7.1 Recoding a Few Variables;227
14.7.2;10.7.2 Recoding Many Variables;227
14.7.3;10.7.3 Example Programs for Recoding Variables;230
14.8;10.8 Keeping and Dropping Variables ;231
14.8.1;10.8.1 Example Programs for Keeping and Dropping Variables;232
14.9;10.9 Stacking/Appending Data Sets;232
14.9.1;10.9.1 Example Programs for Stacking/AppendingData Sets;235
14.10;10.10 Joining/Merging Data Sets;236
14.10.1;10.10.1 Example Programs for Joining/Merging Data Sets;239
14.11;10.11 Creating Collapsed or Aggregated Data Sets;241
14.11.1;10.11.1 The aggregate Function;241
14.11.2;10.11.2 The tapply Function;243
14.11.3;10.11.3 Merging Aggregates with Original Data;244
14.11.4;10.11.4 Tabular Aggregation;246
14.11.5;10.11.5 The reshape Package;248
14.11.6;10.11.6 Example Programs for Collapsing/AggregatingData;248
14.12;10.12 By or Split-File Processing;250
14.12.1;10.12.1 Comparing Summarization Methods;254
14.12.2;10.12.2 Example Programs for By or Split-file Processing;255
14.13;10.13 Removing Duplicate Observations;256
14.13.1;10.13.1 Example Programs for Removing Duplicate Observations;258
14.14;10.14 Selecting First or Last Observations per Group;259
14.14.1;10.14.1 Example Programs for Selecting Last Observation per Group;261
14.15;10.15 Reshaping Variables to Observations and Back;262
14.15.1;10.15.1 Example Programs for Reshaping Variables to Observations and Back;264
14.16;10.16 Sorting Data Frames;265
14.16.1;10.16.1 Example Programs for Sorting Data Sets;268
14.17;10.17 Converting Data Structures;269
14.17.1;10.17.1 Converting from Logical to Numeric Indexand Back;272
15;11 Enhancing Your Output;274
15.1;11.1 Value Labels or Formats (and Measurement Level);274
15.1.1;11.1.1 Character Factors;275
15.1.2;11.1.2 Numeric Factors;277
15.1.3;11.1.3 Making Factors of Many Variables;279
15.1.4;11.1.4 Converting Factors into Numeric or Character Variables;281
15.1.5;11.1.5 Dropping Factor Levels;283
15.1.6;11.1.6 Example Programs for Value Labels or Formats;284
15.2;11.2 Variable Labels;287
15.2.1;11.2.1 Variable Labels in The Hmisc Package;287
15.2.2;11.2.2 Long Variable Names as Labels;288
15.2.3;11.2.3 Other Packages That Support Variable Labels;291
15.2.4;11.2.4 Example Programs for Variable Labels;291
15.3;11.3 Output for Word Processing and Web Pages;292
15.3.1;11.3.1 The xtable Package;293
15.3.2;11.3.2 Other Options for Formatting Output;295
15.3.3;11.3.3 Example Programs for Formatting Output;296
16;12 Generating Data;298
16.1;12.1 Generating Numeric Sequences;299
16.2;12.2 Generating Factors;300
16.3;12.3 Generating Repetitious Patterns (Not Factors);301
16.4;12.4 Generating Integer Measures;302
16.5;12.5 Generating Continuous Measures;304
16.6;12.6 Generating a Data Frame;306
16.7;12.7 Example Programs for Generating Data;306
16.7.1;12.7.1 Stata Program for Generating Data;306
16.7.2;12.7.2 R Program for Generating Data;307
17;13 Managing Your Files and Workspace;312
17.1;13.1 Loading and Listing Objects;312
17.2;13.2 Understanding Your Search Path;315
17.3;13.3 Attaching Data Frames;317
17.4;13.4 Attaching Files;319
17.5;13.5 Removing Objects from Your Workspace;320
17.6;13.6 Minimizing Your Workspace;322
17.7;13.7 Setting Your Working Directory;322
17.8;13.8 Saving Your Workspace;323
17.8.1;13.8.1 Saving Your Workspace Manually;323
17.8.2;13.8.2 Saving Your Workspace Automatically;324
17.9;13.9 Getting Operating Systems to Show You ``.RData'' Files;324
17.10;13.10 Organizing Projects with Windows Shortcuts;325
17.11;13.11 Saving Your Programs and Output;325
17.12;13.12 Saving Your History;326
17.13;13.13 Large Data Set Considerations;326
17.14;13.14 Example R Program for Managing Filesand Workspace;328
18;14 Graphics Overview;332
18.1;14.1 Stata Graphics;333
18.2;14.2 R Graphics;333
18.3;14.3 The Grammar of Graphics;334
18.4;14.4 Other Graphics Packages;336
18.5;14.5 Graphics Procedures and Graphics Systems;336
18.6;14.6 Graphics Devices;337
18.7;14.7 Practice Data: mydata100;339
19;15 Traditional Graphics;340
19.1;15.1 Bar Plots;340
19.1.1;15.1.1 Bar Plots of Counts;340
19.1.2;15.1.2 Bar Plots for Subgroups of Counts;345
19.1.3;15.1.3 Bar Plots of Means;347
19.2;15.2 Adding Titles, Labels, Colors, and Legends;348
19.3;15.3 Graphics Parameters and Multiple Plots on a Page;351
19.4;15.4 Pie Charts;352
19.5;15.5 Dot Charts;354
19.6;15.6 Histograms;354
19.6.1;15.6.1 Basic Histograms;355
19.6.2;15.6.2 Histograms Stacked;357
19.6.3;15.6.3 Histograms Overlaid;358
19.7;15.7 Normal QQ Plots;362
19.8;15.8 Strip Charts;363
19.9;15.9 Scatter Plots and Line Plots;368
19.9.1;15.9.1 Scatter plots with Jitter;371
19.9.2;15.9.2 Scatter plots with Large Data Sets;371
19.9.3;15.9.3 Scatter plots with Lines;373
19.9.4;15.9.4 Scatter plots with Linear Fit by Group;374
19.9.5;15.9.5 Scatter plots by Group or Level (Coplots);375
19.9.6;15.9.6 Scatter plots with Confidence Ellipse;377
19.9.7;15.9.7 Scatter plots with Confidence and PredictionIntervals;378
19.9.8;15.9.8 Plotting Labels Instead of Points;383
19.9.9;15.9.9 Scatter plot Matrices;385
19.10;15.10 Dual-Axes Plots;387
19.11;15.11 Box Plots;389
19.12;15.12 Error Bar Plots;391
19.13;15.13 Interaction Plots;391
19.14;15.14 Adding Equations and Symbols to Graphs;392
19.15;15.15 Summary of Graphics Elements and Parameters;393
19.16;15.16 Plot Demonstrating Many Modifications;394
19.17;15.17 Example Program for Traditional Graphics;395
19.17.1;15.17.1 Stata Program for Traditional Graphics;396
19.17.2;15.17.2 R Program for Traditional Graphics;396
20;16 Graphics with ggplot2;406
20.1;16.1 Introduction;406
20.1.1;16.1.1 Overview qplot and ggplot;407
20.1.2;16.1.2 Missing Values;408
20.1.3;16.1.3 Typographic Conventions;409
20.2;16.2 Bar Plots;410
20.2.1;16.2.1 Pie Charts;413
20.2.2;16.2.2 Bar Charts for Groups;414
20.3;16.3 Plots by Group or Level;415
20.4;16.4 Presummarized Data;417
20.5;16.5 Dot Charts;418
20.6;16.6 Adding Titles and Labels;420
20.7;16.7 Histograms and Density Plots;421
20.7.1;16.7.1 Histograms;421
20.7.2;16.7.2 Density Plots;422
20.7.3;16.7.3 Histograms with Density Overlaid;422
20.7.4;16.7.4 Histograms for Groups, Stacked;424
20.7.5;16.7.5 Histograms for Groups, Overlaid;425
20.8;16.8 Normal QQ Plots;426
20.9;16.9 Strip Plots;426
20.10;16.10 Scatter Plots and Line Plots;429
20.10.1;16.10.1 Scatter Plots with Jitter;431
20.10.2;16.10.2 Scatter Plots for Large Data Sets;432
20.10.3;16.10.3 Hexbin Plots;435
20.10.4;16.10.4 Scatter Plots with Fit Lines;436
20.10.5;16.10.5 Scatter Plots with Reference Lines;437
20.10.6;16.10.6 Scatter Plots with Labels Instead of Points;441
20.10.7;16.10.7 Changing Plot Symbols;442
20.10.8;16.10.8 Scatter Plot with Linear Fits by Group;443
20.10.9;16.10.9 Scatter Plots Faceted for Groups;443
20.10.10;16.10.10 Scatter Plot Matrix;445
20.11;16.11 Box Plots;446
20.12;16.12 Error Bar Plots;449
20.13;16.13 Logarithmic Axes;451
20.14;16.14 Aspect Ratio;451
20.15;16.15 Multiple Plots on a Page;452
20.16;16.16 Saving ggplot2 Graphs to a File;454
20.17;16.17 An Example Specifying All Defaults;454
20.18;16.18 Summary of Graphic Elements and Parameters;456
20.19;16.19 Example Programs for ggplot2;457
21;17 Statistics;474
21.1;17.1 Scientific Notation;474
21.2;17.2 Descriptive Statistics;475
21.2.1;17.2.1 The Hmisc describe Function;475
21.2.2;17.2.2 The summary Function;477
21.2.3;17.2.3 The table Function and Its Relatives;478
21.2.4;17.2.4 The mean Function and Its Relatives;480
21.3;17.3 Cross-Tabulation;481
21.3.1;17.3.1 The CrossTable Function;481
21.3.2;17.3.2 The tables and chisq.test Functions;483
21.4;17.4 Correlation;486
21.4.1;17.4.1 The cor Function;489
21.5;17.5 Linear Regression;491
21.5.1;17.5.1 Plotting Diagnostics;494
21.5.2;17.5.2 Comparing Models;495
21.5.3;17.5.3 Making Predictions with New Data;496
21.6;17.6 t-Test: Independent Groups;497
21.7;17.7 Equality of Variance;498
21.8;17.8 t-Test: Paired or Repeated Measures;499
21.9;17.9 Wilcoxon Mann-Whitney Rank Sum Test: IndependentGroups;500
21.10;17.10 Wilcoxon Signed-Rank Test: Paired Groups;501
21.11;17.11 Analysis of Variance;502
21.12;17.12 Sums of Squares;507
21.13;17.13 The Kruskal--Wallis Test;508
21.14;17.14 Example Programs for Statistical Tests;510
21.14.1;17.14.1 Stata Program for Statistical Tests;510
21.14.2;17.14.2 R Program for Statistical Tests;512
22;18 Conclusion;518
23;Glossary of R jargon;519
24;Comparison of Stata commands and R functions;525
25;Automating Your R Setup;527
25.1;C.1 Setting Options;527
25.2;C.2 Creating Objects;528
25.3;C.3 Loading Packages;528
25.4;C.4 Running Functions;528
25.5;C.5 Example .Rprofile;530
26;Example Simulation;531
26.1;D.1 Stata Example Simulation;531
26.2;D.2 R Example Simulation;532
27;References;533
28;Index;537
"5 Programming Language Basics (p. 45-46)
5.1 Introduction
R is an object-oriented language. Everything that exists in it — variables, data sets, functions (commands) — are all objects. Stata has limitations on command and variable name lengths, based on the version of the software being used. The limits are large, though, and rarely result in a problem for Stata users. In Stata, leading periods in names are not allowed and data set names cannot have periods at all. Object names in R can be any length consisting of letters, numbers, underscores “ ,” or the period “.” and should begin with a letter. However, in R if you always put quotes around a variable or data set name (actually any object name), it can then contain any characters, including spaces.
Case matters in both R and Stata, so you can have two variables—one named myvar and another named MyVar—in the same data set, although that is not a good idea! Some add-on packages tweak function names like the capitalized “Save” to represent a compatible, but enhanced, version of a built-in function like the lowercased “save.” As in any statistics package, it is best to avoid names that match function names like “mean” or that match logical conditions like “TRUE.”
Commands can begin and end anywhere on a line and R will ignore any additional spaces. R will try to execute a function when it reaches the end of a line. Therefore, to continue a function call on a new line, you must ensure that the fragment you leave behind is not already a complete function call by itself. Continuing a function call on a new line after a comma is usually a safe bet. As you will see, R functions frequently use commas, making them a convenient stopping point.
The R console will tell you that it is continuing a line when it changes the prompt from “>” to “+.” If you see “+” unexpectedly, you may have simply forgotten to add the ?nal close parenthesis, “).” Submitting only that character will then ?nish your function call. If you are getting the “+” and cannot ?gure out why, you can cancel the pending function call with the Escape key on Windows or CTRL-C on Macintosh or Linux/UNIX. For CTRL-C, hold the CTRL key down (Linux/UNIX) or the control key (Macintosh) while pressing the letter C. You may end any R function call with a semicolon. That is not required though, except when entering multiple function calls on a single line."




