E-Book, Englisch, 456 Seiten
Reihe: Chapman & Hall/CRC Computer & Information Science Series
Kaeli / Yew Speculative Execution in High Performance Computer Architectures
Erscheinungsjahr 2005
ISBN: 978-1-4200-3515-5
Verlag: Taylor & Francis
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
E-Book, Englisch, 456 Seiten
Reihe: Chapman & Hall/CRC Computer & Information Science Series
            ISBN: 978-1-4200-3515-5 
            Verlag: Taylor & Francis
            
 Format: PDF
    Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Until now, there were few textbooks that focused on the dynamic subject of speculative execution, a topic that is crucial to the development of high performance computer architectures. Speculative Execution in High Performance Computer Architectures describes many recent advances in speculative execution techniques. It covers cutting-edge research projects, as well as numerous commercial implementations that demonstrate the value of this latency-hiding technique.
The book begins with a review of control speculation techniques that use instruction cache prefetching, branch prediction and predication, and multi-path execution. It then examines dataflow speculation techniques including data cache prefetching, address value and data value speculation, pre-computation, and coherence speculation. This textbook also explores multithreaded approaches, emphasizing profile-guided speculation, speculative microarchitectures, and compiler techniques.
Zielgruppe
Computer science students, researchers, and professionals; electrical engineers. Professionals in computer architecture, compilers, and application programs.
Autoren/Hrsg.
Weitere Infos & Material
INTRODUCTION 
David R. Kaeli, Northeastern University and Pen C. Yew, University of Minnesota
INSTRUCTION CACHE PREFETCHING 
Glenn Reinman, UCLA Computer Science Department
Direct Mapped Cache 
Set Associative Cache 
Pseudo Associative Cache 
Way Prediction Cache 
Next Line Prefetching 
Target Prefetching 
Stream Buffers 
Nonblocking Instruction Caches and Out-Of-Order Fetch
Fetch Directed Instruction Prefetching 
Integrated Prefetching 
Wrong-Path Prefetching 
Compiler Strategies 
BRANCH PREDICTION
Philip G. Emma, IBM T.J. Watson Research Laboratory
The von Neumann Programming Model vs. ENIAC 
Dataflow and Control Flow 
The Branch Instruction 
The IAS Machine: A Primitive Stored-Program Architecture
Virtuality
Branch Instruction Semantics 
General Instruction-Set Architectures and Extensions 
Memory Consistency and Observable Order
Branches and Performance 
Pipelining 
Pipeline Disruptions and Their Penalties 
Superscalar Processing 
Multithreading
Instruction Prefetching and Autonomy 
The Delayed Branch Instruction 
Branch Flow in a Pipeline: The "When" of Branch Prediction
Predicting Branches at Decode Time
Predicting Branches at Instruction-Prefetch Time 
Static Branch Prediction
Dynamic Branch Prediction 
Branch Prediction With Counters 
Predicting by Profiling Branch Actions 
Group Behaviors vs. Predicting Individual Branches 
The Decode History Table (a.k.a. Branch History Table)
Discriminators 
Using Multiple Discriminators: A Path-Based Approach
Implementation 
A Timing Caveat 
Hybrid Predictors 
Instruction Prefetching 
The Branch History Table (a.k.a. Branch Target Buffer)
Operation of the BTB 
Fetch Width and Branch Mispredictions 
The Subroutine Call and Return Structure 
Predicting Return Addresses by Using a Stack 
Recognizing Subroutine Calls and Returns 
Taking Advantage of the BTB Structure 
Eliminating the Stack
Working Sets and Contexts 
The Size of a BTB Entry 
The BTB and the Instruction Cache: Economies of Size 
More Exotic Prediction for the More Difficult Branches 
Branches and the Operand Space 
Branches and the Operand-Address Space 
Tandem Branch Prediction 
Accuracy and the Updating of Tables 
Predictor Bandwidth and Anomalous Behaviors 
The Importance of Fast Prediction Mechanisms 
Superscalar Processing and the Monolithic Prediction
of Branch Sequences 
Predicting Branches in a Multithreaded Environment 
Limitations 
Simplicity 
Complexity 
Two Saving Graces 
Implementing Real Branch Prediction Mechanisms 
TRACE CACHES 
Eric Rotenberg North Carolina State University
Traces 
Core Fetch Unit Based on Instruction Cache 
Trace Cache Operation 
Path Associativity 
Indexing Strategy 
Partial Matching
Coupling Branch Prediction with the Trace Cache 
Trace Selection Policy 
Multi-Phase Trace Construction 
Managing Overlap between Instruction Cache and Trace
Cache 
Speculative vs. Non-Speculative Trace Cache Updates 
Powerful vs. Weak Core Fetch Unit 
Parallel vs. Serial Instruction Cache Access 
L1 vs. L2 Instruction Cache 
Loop Caches 
BRANCH PREDICATION 
David August, Princeton University
Overcoming Branch Problems with Predication 
If-Conversion 
Predicate Optimization and Analysis 
The Predicated Intermediate Representation 
Hewlett-Packard Laboratories PD 
Cydrome Cydra 5 
ARM 
Texas Instruments C6X 
Systems with Limited Predicated Execution Support
Predication in the Itanium 2 Processor 
MULTIPATH EXECUTION 
Augustus K. Uht, University of Rhode Island
Branch Tree Geometry 
Branch Path/Instruction ID 
Phases of Operation 
Granularity
With Predication 
With Data Speculation 
Compiler-Assisted 
Hardware: Classically-Based 
Hardware: Non Classically-Based 
Multiprocessors 
Functional or Logic Language Machines 
Branch Prediction 
Confidence Estimation 
Pipeline Depth 
Implications of Amdahl's Law - ILP Version 
Memory Bandwidth Requirements 
DATA CACHE PREFETCHING 
Yan Solihin, North Carolina State University, and Donald Yeung, University of Maryland at College Park
Architectural Support 
Array Prefetching 
Pointer Prefetching 
Relationship with Data Locality Optimizations 
Stride and Sequential Prefetching 
Correlation Prefetching 
Content-Based Prefetching 
ADDRESS PREDICTION 
Avi Mendelson, Intel Mobil Micro-Processor Architect
Terminology and Definitions 
Non-Speculative Address Calculation Techniques 
Speculative Address Calculation Techniques 
Chapter Focus 
Characterization of Address Predictability 
Address Predictability vs. Value Predictability 
Combining Address Prediction with Prefetching Mechanism
Basic Characterization 
Load Promotion 
Memory Bypassing 
Compiler Based Speculative Load Promotion 
DATA SPECULATION 
Yiannakis Sazeides, University of Cyprus; Pedro Marcuello, Intel-UPC Barcelona Research Center; James E. Smith, University
of Wisconsin-Madison; and Antonio González, Universitat Polit`ecnica de Catalunya
Basic Value Predictors 
Value Predictor Alternatives 
Confidence Estimation 
Implementation Issues 
Data Dependence Predictors
Verification 
Recovery 
Other Microarchitectural Implications of Data Value
Speculation 
Related Work: Data Value Speculation 
Related Work: Data Dependence Speculation 
INSTRUCTION PRECOMPUTATION: DYNAMICALLY REMOVING REDUNDANT COMPUTATIONS USING PROFILING 
Joshua J. Yi, Freescale Semiconductor Inc.; Resit Sendag, University of Rhode Island; and David J. Lilja, University of Minnesota at Twin Cities
A Comparison of Instruction Precomputation and Value Reuse 
Upper-Bound - Profile A, Run A 
Different Input Sets - Profile B, Run A 
Combination of Input Sets - Profile AB, Run A 
Frequency versus Frequency and Latency Product 
Performance of Instruction Precomputation versus Value Reuse 
PROFILE-BASED SPECULATION 
Youfeng Wu and Jesse Fang, Intel Microprocessor Technology Labs
Control Flow Profile 
Memory Profile 
Value Profile 
Static Analysis 
Instrumentation 
Hardware Performance Monitoring 
Special Hardware 
Software-Hardware Collaborative Profiling 
Compile-time Profiling 
Runtime Profiling
Continuous Profiling 
Trace Scheduling
Hot-Cold Optimizations 
Code Layout 
Data Layout 
Stride Prefetching 
Hot Data Stream Prefetching 
Mississippi Delta Prefetching 
Java Runtime Parallelizing Machine 
Speculative Parallel Threading 
Speculative Computation Reuse 
Software-Based Speculative Precomputation 
Stability across Multiple Workloads 
Update When Program Changes 
Maintenance during optimizations 
Perturbation by Profiling Code 
COMPILATION AND SPECULATION 
Jin Lin, Wei-Chung Hsu and Pen-Chung Yew University of Minnesota, Minneapolis
Alias Profiling 
Data Dependence Profiling 
Speculative Alias and Dataflow Analyses
A Framework for Speculative Alias Analysis and Dataflow Analysis 
Overview 
F-Insertion Step 
Rename Step 
Downsafety Step 
CodeMotion Step
Recovery Code Generation for General Speculative Optimizations
Check Instructions and Recovery Code Representation for Multi-Level 
Speculation
Interaction of the Early Introduced Recovery Code with Later 
Optimizations
MULTITHREADING AND SPECULATION 
Pedro Marcuello, Jesus Sanchez and Antonio Gonzalez Intel-UPC Barcelona Research Center; Intel Labs; Universitat Politecnica de Catalunya; Barcelona (Spain)
Building Helper Threads 
Microarchitectural Support for Helper Threads 
Thread Spawning Schemes 
Microarchitectural Support for Speculative Architectural Threads 
References 
Andreas Moshovos, University of Toronto
EXPLOITING LOAD/STORE PARALLELISM VIA MEMORY DEPENDENCE PREDICTION 
Static Methods 
Hybrid Static/Dynamic Methods 
Dynamic Methods 
Working Example 
Multiple Dependences Per Static Load or Store 
Methodology
Performance Potential of Load/Store Parallelism 
Performance with Naive Memory Dependence Speculation
Using Address-Based Scheduling to Extract Load/Store Parallelism 
Speculation/Synchronization
RESOURCE FLOW MICROARCHITECTURES 
David A. Morano and David R. Kaeli, Northeastern University
The Operand as a First Class Entity 
Dynamic Dependency Ordering 
Handling Multipath Execution 
Names and Renaming 
The Active Station Idea 
Register and Memory Operand Storage 
Operand Forwarding and Snooping 
Result Forwarding Buses and Operand Filtering 
A Small Resource-Flow Microarchitecture 
A Distributed Scalable Resource-Flow Microarchitecture





