E-Book, Englisch, 220 Seiten
Fossati / Gall / Grabner Consumer Depth Cameras for Computer Vision
1. Auflage 2012
ISBN: 978-1-4471-4640-7
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Research Topics and Applications
E-Book, Englisch, 220 Seiten
Reihe: Advances in Computer Vision and Pattern Recognition
ISBN: 978-1-4471-4640-7
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
The potential of consumer depth cameras extends well beyond entertainment and gaming, to real-world commercial applications. This authoritative text reviews the scope and impact of this rapidly growing field, describing the most promising Kinect-based research activities, discussing significant current challenges, and showcasing exciting applications. Features: presents contributions from an international selection of preeminent authorities in their fields, from both academic and corporate research; addresses the classic problem of multi-view geometry of how to correlate images from different viewpoints to simultaneously estimate camera poses and world points; examines human pose estimation using video-rate depth images for gaming, motion capture, 3D human body scans, and hand pose recognition for sign language parsing; provides a review of approaches to various recognition problems, including category and instance learning of objects, and human activity recognition; with a Foreword by Dr. Jamie Shotton.
Dr. Andrea Fossati and Dr. Helmut Grabner are post-doctoral researchers in the Computer Vision Laboratory at ETH Zurich, Switzerland.Dr. Juergen Gall is a Senior Researcher at the Max Planck Institute for Intelligent Systems, Tübingen, Germany.Dr. Xiaofeng Ren is a Research Scientist at the Intel Science and Technology Center for Pervasive Computing, Intel Labs, and an Affiliate Assistant Professor at the Department of Computer Science and Engineering of the University of Washington, Seattle, WA, USA.Dr. Kurt Konolige is a Senior Researcher at Industrial Perception Inc., Palo Alto, CA, USA.
Autoren/Hrsg.
Weitere Infos & Material
1;Consumer Depth Cameras for Computer Vision;3
1.1;Foreword;5
1.1.1;Working on Human Pose Estimation for Kinect;6
1.1.2;Beyond Entertainment;7
1.1.3;Looking to the Future;8
1.2;Preface;9
1.3;Contents;12
1.4;Acronyms;14
2;Part I: 3D Registration and Reconstruction;16
2.1;Chapter 1: 3D with Kinect;18
2.1.1;1.1 Introduction;18
2.1.2;1.2 Kinect as a 3D Measuring Device;19
2.1.2.1;1.2.1 IR Image;20
2.1.2.2;1.2.2 RGB Image;21
2.1.2.3;1.2.3 Depth Image;21
2.1.2.4;1.2.4 Depth Resolution;21
2.1.3;1.3 Kinect Geometrical Model;23
2.1.3.1;1.3.1 Shift Between IR Image and Depth Image;24
2.1.3.2;1.3.2 Identi?cation of the IR Projector Geometrical Center;25
2.1.3.3;1.3.3 Identi?cation of Effective Depth Resolutions of the IR Camera and Projector Stereo Pair;26
2.1.4;1.4 Kinect Calibration;29
2.1.4.1;1.4.1 Learning Complex Residual Errors;30
2.1.5;1.5 Validation;31
2.1.5.1;1.5.1 Kinect Depth Models Evaluation on a 3D Calibration Object;34
2.1.5.2;1.5.2 Comparison of Kinect, SLR Stereo and 3D TOF;35
2.1.5.3;1.5.3 Combining Kinect and Structure from Motion;36
2.1.6;1.6 Conclusion;39
2.1.7;References;39
2.2;Chapter 2: Real-Time RGB-D Mapping and 3-D Modeling on the GPU Using the Random Ball Cover;41
2.2.1;2.1 Introduction;42
2.2.2;2.2 Related Work;43
2.2.3;2.3 Methods;45
2.2.3.1;2.3.1 Data Preprocessing on the GPU;46
2.2.3.1.1;Nomenclature;46
2.2.3.1.2;Landmark Extraction;47
2.2.3.2;2.3.2 Photogeometric ICP Framework;47
2.2.3.3;2.3.3 6-D Nearest Neighbor Search Using RBC;48
2.2.4;2.4 Implementation Details;50
2.2.4.1;2.4.1 Details Regarding the ICP Framework;50
2.2.4.2;2.4.2 RBC Construction and Queries on the GPU;51
2.2.4.2.1;RBC Construction;51
2.2.4.2.2;RBC Nearest Neighbor Queries;53
2.2.5;2.5 Experiments and Results;53
2.2.5.1;2.5.1 Qualitative Results;53
2.2.5.2;2.5.2 Performance Study;56
2.2.5.2.1;Preprocessing Pipeline;56
2.2.5.2.2;ICP Using RBC;56
2.2.5.3;2.5.3 Approximate RBC;57
2.2.6;2.6 Discussion and Conclusions;59
2.2.7;References;60
2.3;Chapter 3: A Brute Force Approach to Depth Camera Odometry;63
2.3.1;3.1 Introduction;63
2.3.2;3.2 Related Work;64
2.3.3;3.3 Proposed Method;65
2.3.3.1;3.3.1 Algorithm Overview;66
2.3.3.2;3.3.2 Practical Issues;67
2.3.3.2.1;Feature Extraction;67
2.3.3.2.2;Score Evaluation;67
2.3.3.3;3.3.3 Implementation Details;67
2.3.4;3.4 Experimental Results;68
2.3.4.1;3.4.1 Qualitative Evaluation;68
2.3.4.2;3.4.2 Precision Analysis;69
2.3.4.3;3.4.3 Comparison with the ICP Method;72
2.3.5;3.5 Conclusion and Future Work;73
2.3.6;References;73
3;Part II: Human Body Analysis;75
3.1;Chapter 4: Key Developments in Human Pose Estimation for Kinect;77
3.1.1;4.1 Introduction: The Challenge;77
3.1.2;4.2 Body Part Classi?cation-The Natural Markers Approach;78
3.1.2.1;4.2.1 Generating the Training Data;79
3.1.2.2;4.2.2 Randomized Forests for Classi?cation;79
3.1.3;4.3 Random Forest Regression-The Voting Approach;80
3.1.4;4.4 Context-Sensitive Pose Estimation-Conditional Regression Forests;81
3.1.5;4.5 One-Shot Model Fitting: The Vitruvian Manifold;82
3.1.6;4.6 Directions for Future Work;83
3.1.7;References;83
3.2;Chapter 5: A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera;85
3.2.1;5.1 Introduction;86
3.2.1.1;Contributions;87
3.2.2;5.2 Related Work;88
3.2.2.1;Intensity-Image-Based Tracking;88
3.2.2.2;Depth-Image-Based Tracking;88
3.2.3;5.3 Acquisition and Data Preparation;90
3.2.3.1;5.3.1 Depth Data;90
3.2.3.2;5.3.2 Model of the Actor;91
3.2.3.3;5.3.3 Pose Database;92
3.2.3.4;5.3.4 Normalization;93
3.2.4;5.4 Pose Reconstruction Framework;94
3.2.4.1;5.4.1 Local Optimization;95
3.2.4.2;5.4.2 Feature Computation;95
3.2.4.3;5.4.3 Database Lookup;101
3.2.4.4;5.4.4 Hypothesis Selection;102
3.2.5;5.5 Experiments;103
3.2.5.1;5.5.1 Feature Extraction;103
3.2.5.2;5.5.2 Quantitative Evaluation;103
3.2.5.3;5.5.3 Run Time;105
3.2.5.4;5.5.4 Qualitative Evaluation;105
3.2.5.5;5.5.5 Limitations;108
3.2.6;5.6 Conclusions;109
3.2.7;References;109
3.3;Chapter 6: Home 3D Body Scans from a Single Kinect;113
3.3.1;6.1 Introduction;114
3.3.2;6.2 Related Work;116
3.3.3;6.3 Sensor and Preprocessing;117
3.3.3.1;Intrinsic Calibration;118
3.3.3.2;Stereo Calibration;118
3.3.3.3;Depth Calibration;118
3.3.3.4;Ground Plane;118
3.3.3.5;Segmentation;118
3.3.4;6.4 Body Model and Fitting;119
3.3.4.1;6.4.1 SCAPE Body Model;119
3.3.4.2;6.4.2 Pose Initialization;120
3.3.4.3;6.4.3 Depth Objective;121
3.3.4.4;6.4.4 Silhouette Objective;121
3.3.4.5;6.4.5 Optimization;124
3.3.5;6.5 Results;124
3.3.5.1;From Bodies to Measurements;127
3.3.5.2;Accuracy Relative to Laser Scans;127
3.3.5.3;Linear Measurement Accuracy;129
3.3.6;6.6 Conclusions;129
3.3.7;References;130
3.4;Chapter 7: Real Time Hand Pose Estimation Using Depth Sensors;132
3.4.1;7.1 Introduction;132
3.4.1.1;7.1.1 Related Work;134
3.4.1.1.1;7.1.1.1 Hand Pose Estimation;134
3.4.1.1.2;7.1.1.2 Hand Shape Recognition from Depth;135
3.4.2;7.2 Methodology;135
3.4.2.1;7.2.1 Data;136
3.4.2.2;7.2.2 Decision Trees;137
3.4.2.3;7.2.3 Randomized Decision Forest for Hand Pose Estimation;138
3.4.2.4;7.2.4 Joint Position Estimation;140
3.4.3;7.3 Experiments;141
3.4.3.1;7.3.1 Datasets;141
3.4.3.1.1;7.3.1.1 Synthetic Dataset;141
3.4.3.1.2;7.3.1.2 Real Dataset;141
3.4.3.2;7.3.2 Effect of Model Parameters;141
3.4.3.2.1;7.3.2.1 The Effect of the Forest Size;142
3.4.3.2.2;7.3.2.2 The Effect of the Tree Depth;142
3.4.3.2.3;7.3.2.3 The Effect of the Feature Space;142
3.4.3.2.4;7.3.2.4 The Effect of the Sample Size;143
3.4.3.2.5;7.3.2.5 The Effect of the Mean Shift Parameters;144
3.4.3.3;7.3.3 Hand Pose Estimation Results;145
3.4.3.4;7.3.4 Proof of Concept: American Sign Language Digit Recognizer;146
3.4.3.4.1;7.3.4.1 Hand Shape Classi?ers;147
3.4.3.4.2;7.3.4.2 Model Selection on the Synthetic Dataset;147
3.4.3.4.3;7.3.4.3 ASL Digit Classi?cation Results on Real Data;147
3.4.4;7.4 Conclusion;148
3.4.5;References;149
4;Part III: RGB-D Datasets;151
4.1;Chapter 8: A Category-Level 3D Object Dataset: Putting the Kinect to Work;152
4.1.1;8.1 Introduction;153
4.1.2;8.2 Related Work;156
4.1.2.1;8.2.1 3D Datasets for Detection;156
4.1.2.1.1;RGBD-Dataset of [23];156
4.1.2.1.2;UBC Visual Robot Survey [3, 20];156
4.1.2.1.3;3D Table Top Object Dataset [28];156
4.1.2.1.4;Solutions in Perception Challenge [2];156
4.1.2.1.5;Max Plank Institute Kinect Dataset [8];156
4.1.2.1.6;Indoor Scene Segmentation Dataset [27];157
4.1.2.1.7;Other Datasets;158
4.1.2.2;8.2.2 3D and 2D/3D Recognition;158
4.1.3;8.3 The Berkeley 3D Object Dataset;159
4.1.3.1;8.3.1 Data Annotation;159
4.1.3.2;8.3.2 The Kinect Sensor;160
4.1.3.3;8.3.3 Smoothing Depth Images;160
4.1.3.4;8.3.4 Data Statistics;161
4.1.4;8.4 Detection Baselines;163
4.1.4.1;8.4.1 Sliding Window Detector;163
4.1.4.2;8.4.2 Evaluation;164
4.1.4.3;8.4.3 Pruning and Rescoring by Size;166
4.1.5;8.5 A Histogram of Curvature (HOC);167
4.1.5.1;8.5.1 Curvature;168
4.1.5.2;8.5.2 HOC;168
4.1.5.3;8.5.3 Experimental Setup and Baselines;172
4.1.5.4;8.5.4 Results;173
4.1.6;8.6 Discussion;174
4.1.7;References;174
4.2;Chapter 9: RGB-D Object Recognition: Features, Algorithms, and a Large Scale Benchmark;177
4.2.1;9.1 Introduction;178
4.2.2;9.2 RGB-D Object Dataset Collection;178
4.2.3;9.3 Segmentation;179
4.2.4;9.4 Video Scene Annotation;182
4.2.5;9.5 RGB-D Object Recognition;184
4.2.5.1;9.5.1 Experimental Setup;185
4.2.5.2;9.5.2 Distance Learning for RGB-D Object Recognition;185
4.2.5.2.1;9.5.2.1 Instance Distance Learning;186
4.2.5.2.2;9.5.2.2 RGB-D Feature Set;186
4.2.5.2.3;9.5.2.3 Evaluation;187
4.2.5.3;9.5.3 Kernel Descriptors for RGB-D Object Recognition;189
4.2.5.3.1;9.5.3.1 Kernel Descriptors;189
4.2.5.3.2;9.5.3.2 Evaluation;191
4.2.5.4;9.5.4 Joint Object Category, Instance, and Pose Recognition;192
4.2.5.4.1;9.5.4.1 Object-Pose Tree;192
4.2.5.4.2;9.5.4.2 Evaluation;193
4.2.6;9.6 Object Detection in Scenes Using RGB-D Cameras;194
4.2.6.1;9.6.1 RGB-D Object Detection;195
4.2.6.2;9.6.2 Scene Labeling;198
4.2.7;9.7 Discussion;200
4.2.8;References;200
4.3;Chapter 10: RGBD-HuDaAct: A Color-Depth Video Database for Human Daily Activity Recognition;203
4.3.1;10.1 Introduction;203
4.3.2;10.2 Related Works;204
4.3.3;10.3 RGBD-HuDaAct: Color-Depth Human Daily Activity Database;205
4.3.3.1;10.3.1 Related Video Databases;205
4.3.3.2;10.3.2 Database Construction;207
4.3.3.3;10.3.3 Database Statistics;207
4.3.4;10.4 Color-Depth Fusion for Activity Recognition;208
4.3.4.1;10.4.1 Depth-Layered Multi-channel STIPs (DLMC-STIPs);209
4.3.4.2;10.4.2 3-Dimensional Motion History Images (3D-MHIs);211
4.3.5;10.5 Experimental Evaluations;213
4.3.5.1;10.5.1 Evaluation Schemes;213
4.3.5.2;10.5.2 DLMC-STIPs vs. STIPs;214
4.3.5.3;10.5.3 3D-MHIs vs. MHIs;215
4.3.6;10.6 Conclusions;216
4.3.7;References;217
5;Index;219




