E-Book, Englisch, 174 Seiten
Reihe: Cognitive Technologies
Frommberger Qualitative Spatial Abstraction in Reinforcement Learning
1. Auflage 2010
ISBN: 978-3-642-16590-0
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 174 Seiten
Reihe: Cognitive Technologies
ISBN: 978-3-642-16590-0
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Reinforcement learning has developed as a successful learning approach for domains that are not fully understood and that are too complex to be described in closed form. However, reinforcement learning does not scale well to large and continuous problems. Furthermore, acquired knowledge specific to the learned task, and transfer of knowledge to new tasks is crucial. In this book the author investigates whether deficiencies of reinforcement learning can be overcome by suitable abstraction methods. He discusses various forms of spatial abstraction, in particular qualitative abstraction, a form of representing knowledge that has been thoroughly investigated and successfully applied in spatial cognition research. With his approach, he exploits spatial structures and structural similarity to support the learning process by abstracting from less important features and stressing the essential ones. The author demonstrates his learning approach and the transferability of knowledge by having his system learn in a virtual robot simulation system and consequently transfer the acquired knowledge to a physical robot. The approach is influenced by findings from cognitive science. The book is suitable for researchers working in artificial intelligence, in particular knowledge representation, learning, spatial cognition, and robotics.
Dr. Frommberger is a researcher in the Cognitive Systems Research Group (SFB/TR 8 Spatial Cognition) of Universität Bremen; his special areas of expertise are spatial abstraction techniques, efficient reinforcement learning, cognitive logistics and qualitative representations of space.
Autoren/Hrsg.
Weitere Infos & Material
1;Foreword;4
2;Preface;6
3;Contents;9
4;Symbols;13
5;Acronyms;15
6;1 Introduction;16
6.1;1.1 Learning Machines;16
6.1.1;1.1.1 An Agent Control Task;17
6.1.2;1.1.2 Structure of a State Space;19
6.1.3;1.1.3 Abstraction;19
6.1.4;1.1.4 Knowledge Reuse;20
6.2;1.2 Thesis and Contributions;21
6.3;1.3 Outline of the Thesis;22
7;2 Foundations of Reinforcement Learning;24
7.1;2.1 Machine Learning;24
7.2;2.2 The Reinforcement Learning Model;25
7.3;2.3 Markov Decision Processes;26
7.3.1;2.3.1 Definition of a Markov Decision Process;27
7.3.2;2.3.2 Solving a Markov Decision Processes;28
7.3.3;2.3.3 Partially Observable Markov Decision Processes;30
7.4;2.4 Exploration;31
7.4.1;2.4.1 -Greedy Action Selection;32
7.4.2;2.4.2 Other Exploration Methods;32
7.5;2.5 Temporal Difference Learning;32
7.5.1;2.5.1 TD(0);33
7.5.2;2.5.2 Eligibility Traces/TD();33
7.5.3;2.5.3 Q-Learning;34
7.6;2.6 Performance Measures;35
8;3 Abstraction and Knowledge Transfer in Reinforcement Learning;37
8.1;3.1 Challenges in Reinforcement Learning;37
8.1.1;3.1.1 Reinforcement Learning in Complex State Spaces;38
8.1.2;3.1.2 Use and Reuse of Knowledge Gained by Reinforcement Learning;38
8.2;3.2 Value Function Approximation;40
8.2.1;3.2.1 Value Function Approximation Methods;41
8.2.2;3.2.2 Function Approximation and Optimality;44
8.3;3.3 Temporal Abstraction;44
8.3.1;3.3.1 Semi-Markov Decision Processes;45
8.3.2;3.3.2 Options;45
8.3.3;3.3.3 MAXQ;46
8.3.4;3.3.4 Skills;46
8.3.5;3.3.5 Further Approaches and Limitations;47
8.4;3.4 Spatial Abstraction;47
8.4.1;3.4.1 Adaptive State Space Partitions;48
8.4.2;3.4.2 Knowledge Reuse Based on Domain Knowledge;50
8.4.3;3.4.3 Combining Spatial and Temporal Abstraction;51
8.4.4;3.4.4 Further Task-Specific Abstractions;51
8.5;3.5 Transfer Learning;51
8.5.1;3.5.1 The DARPA Transfer Learning Program;52
8.5.2;3.5.2 Intra-domain Transfer Methods;53
8.5.3;3.5.3 Cross-domain Transfer Methods;53
8.6;3.6 Summary and Discussion;55
9;4 Qualitative State Space Abstraction;56
9.1;4.1 Abstraction of the State Space;56
9.2;4.2 A Formal Framework of Abstraction;57
9.2.1;4.2.1 Definition of Abstraction;58
9.2.2;4.2.2 Aspectualization;59
9.2.3;4.2.3 Coarsening;61
9.2.4;4.2.4 Conceptual Classification;62
9.2.5;4.2.5 Related Work on Abstraction;63
9.3;4.3 Abstraction and Representation;64
9.4;4.4 Abstraction in Agent Control Processes;67
9.4.1;4.4.1 An Action-Centered View on Abstraction;67
9.4.2;4.4.2 Preserving the Optimal Policy;68
9.4.3;4.4.3 Accessibility of the Representation;69
9.5;4.5 Spatial Abstraction in Reinforcement Learning;70
9.5.1;4.5.1 An Architecture for Spatial Abstraction in Reinforcement Learning;70
9.5.2;4.5.2 From MDPs to POMDPs;72
9.5.3;4.5.3 Temporally Extended Actions;73
9.5.4;4.5.4 Criteria for Efficient Abstraction;73
9.5.5;4.5.5 The Role of Domain Knowledge;74
9.6;4.6 A Qualitative Approach to Spatial Abstraction;75
9.6.1;4.6.1 Qualitative Spatial Representations;75
9.6.2;4.6.2 Qualitative State Space Abstraction in Agent Control Tasks;76
9.6.3;4.6.3 Qualitative Representations and Aspectualization;77
9.7;4.7 Summary;77
10;5 Generalization and Transfer Learning with Qualitative Spatial Abstraction;79
10.1;5.1 Reusing Knowledge in Learning Tasks;79
10.1.1;5.1.1 Structural Similarity;80
10.1.2;5.1.2 Structural Similarity and Knowledge Transfer;80
10.2;5.2 Aspectualizable State Spaces;81
10.2.1;5.2.1 A Distinction Between Different Aspects of Problems;82
10.2.2;5.2.2 Using Goal-Directed and Generally Sensible Behavior for Knowledge Transfer;82
10.2.3;5.2.3 Structure Space and Task Space;83
10.3;5.3 Value-Function-Approximation-Based Task Space Generalization;86
10.3.1;5.3.1 Maintaining Structure Space Knowledge;86
10.3.2;5.3.2 An Introduction to Tile Coding;87
10.3.3;5.3.3 Task Space Tile Coding;90
10.3.4;5.3.4 Ad Hoc Transfer of Policies Learned with Task Space Tile Coding;93
10.3.5;5.3.5 Discussion of Task Space Tile Coding;94
10.4;5.4 A Posteriori Structure Space Transfer;94
10.4.1;5.4.1 Q-Value Averaging over Task Space;95
10.4.2;5.4.2 Avoiding Task Space Bias;95
10.4.3;5.4.3 Measuring Confidence of Generalized Policies;97
10.5;5.5 Discussion of the Transfer Methods;98
10.5.1;5.5.1 Comparison of the Transfer Methods;98
10.5.2;5.5.2 Outlook: Hierarchical Learning of Task and Structure Space Policies;99
10.6;5.6 Structure-Induced Task Space Aspectualization;100
10.6.1;5.6.1 Decision and Non-decision States;101
10.6.2;5.6.2 Identifying Non-decision Structures;101
10.6.3;5.6.3 SITSA: Abstraction in Non-decision States;102
10.6.4;5.6.4 Discussion of SITSA;102
10.7;5.7 Summary;103
11;6 RLPR -- An Aspectualizable State Space Representation;105
11.1;6.1 Building a Task-Specific Spatial Representation;105
11.1.1;6.1.1 A Goal-Directed Robot Navigation Task;106
11.1.2;6.1.2 Identifying Task and Structure Space;107
11.1.3;6.1.3 Representation and Frame of Reference;107
11.2;6.2 Representing Task Space;108
11.2.1;6.2.1 Usage of Landmarks;108
11.2.2;6.2.2 Landmarks and Ordering Information;109
11.2.3;6.2.3 Representing Singular Landmarks;110
11.2.4;6.2.4 Views as Landmark Information;115
11.2.5;6.2.5 Navigation Based on Landmark Information Only;118
11.3;6.3 Representing Structure Space;119
11.3.1;6.3.1 Relative Line Position Representation (RLPR);120
11.3.2;6.3.2 Building an RLPR Feature Vector;126
11.3.3;6.3.3 Variants of RLPR;126
11.3.4;6.3.4 Abstraction Effects in RLPR;127
11.3.5;6.3.5 RLPR and Collision Avoidance;128
11.4;6.4 Landmark-Enriched RLPR;129
11.4.1;6.4.1 Properties of le-RLPR;129
11.5;6.5 Robustness of le-RLPR;130
11.5.1;6.5.1 Robustness of Task Space Representation;131
11.5.2;6.5.2 Robustness of Structure Space Representation;132
11.6;6.6 Summary;134
12;7 Empirical Evaluation;135
12.1;7.1 Evaluation Setup;135
12.1.1;7.1.1 The Testbed;135
12.1.2;7.1.2 The Motion Noise Model;136
12.1.3;7.1.3 The le-RLPR Representation;137
12.1.4;7.1.4 Learning Algorithm, Rewards, and Cross-validation;137
12.2;7.2 Learning Performance;138
12.2.1;7.2.1 Performance of le-RLPR-Based Representations;139
12.2.2;7.2.2 le-RLPR Compared to the Original MDP;141
12.2.3;7.2.3 Quality of le-RLPR-Based Solutions;142
12.2.4;7.2.4 Effect of Task Space Tile Coding;143
12.2.5;7.2.5 Task Space Information Only;144
12.2.6;7.2.6 Learning Navigation with Point-Based Landmarks;146
12.2.7;7.2.7 Evaluation of SITSA;147
12.3;7.3 Behavior Under Noise;148
12.3.1;7.3.1 Robustness Under Motion Noise;149
12.3.2;7.3.2 Robustness Under Distorted Perception;150
12.4;7.4 Generalization and Transfer Learning;153
12.4.1;7.4.1 le-RLPR and Modified Environments;154
12.4.2;7.4.2 Policy Transfer to New Environments;155
12.5;7.5 RLPR-Based Navigation in Real-World Environments;158
12.5.1;7.5.1 Properties of a Real Office Environment;158
12.5.2;7.5.2 Differences of the Real Robot;159
12.5.3;7.5.3 Operation on Identical Observations;161
12.5.4;7.5.4 Training and Transfer;161
12.5.5;7.5.5 Behavior of the Real Robot;162
12.6;7.6 Summary;163
13;8 Summary and Outlook;167
13.1;8.1 Summary of the Results;167
13.2;8.2 Future Work;170
14;References;172
15;Index;182




