E-Book, Englisch, 485 Seiten
Koranne Practical Computing on the Cell Broadband Engine
1. Auflage 2009
ISBN: 978-1-4419-0308-2
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 485 Seiten
ISBN: 978-1-4419-0308-2
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Practical Programming in the Cell Broadband Engine offers a unique programming guide for the Cell Broadband Engine, demonstrating a large number of real-life programs to identify and solve problems in engineering, logic design, VLSI CAD, number-theory, graph-theory, computational geometry, image processing, and other subjects. Key features include: Numerous diagrams, mnemonics, tables, charts, code samples for making program development on the CBE as accessible as possible Comprehensive reading list for introductory material to the subject matter A website providing all source codes and sample-data for examples presented in this text.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;6
2;Accompanying Website for this Book;12
3;Contents;13
4;List of Tables;21
5;List of Figures;23
6;Listings;27
7;Acronyms;31
8;Part I Introducing the Cell Broadband Engine;33
8.1;Chapter 1 Introduction;35
8.1.1;1.1 About this book;35
8.1.2;1.2 Background of the CBE Architecture and Processor;37
8.1.3;1.3 Design of the Cell Architecture;41
8.1.4;1.4 The Cell as a supercomputer;43
8.1.5;1.5 Major Components of the Cell Broadband Engine;45
8.1.6;1.6 Flex I/O, DDR/XDR, Interrupt Controller;46
8.1.7;1.7 Conclusions;46
8.2;Chapter 2 The Power Processing Element (PPE);48
8.2.1;2.1 PPE: the Control Plane of the CBEA;48
8.2.2;2.2 PPE Problem State Registers;50
8.2.3;2.3 Memory arrays in the CBEA;52
8.2.4;2.4 Viewing the PPE as a dual-core processor: EMT specifics;54
8.2.5;2.5 PowerPC Instruction Set;57
8.2.6;2.6 SPE Memory Management and EA Aliasing;57
8.2.7;2.7 Power Processor Element (PPE) Cache Utilization;58
8.2.8;2.8 Understanding the PPE Pipeline;60
8.2.9;2.9 Measuring and Profiling Application Performance;64
8.2.10;2.10 Conclusion;65
8.3;Chapter 3 The Synergistic Processing Element;66
8.3.1;3.1 Introduction to Cell SPE;66
8.3.2;3.2 SPE Local Store;69
8.3.3;3.3 SPU LS Priority;70
8.3.4;3.4 GPRs in the SPE;71
8.3.5;3.5 SPE Intrinsics and Instruction;71
8.3.6;3.6 SPU Instruction Set: Pipelines, Execution Units and Latency;72
8.3.7;3.7 Load/Store Optimization on SPU;75
8.3.8;3.8 Branch Optimization on SPU;78
8.3.9;3.9 Instruction Scheduling;80
8.3.10;3.10 Memory flow;80
8.3.11;3.11 Mailbox Facility;81
8.3.12;3.12 Use SPU Intrinsics with Mailboxes;85
8.3.13;3.13 Synergistic Processor Unit Channels and Memory Flow Controller;87
8.3.14;3.14 SPU Timer and Events;89
8.3.15;3.15 SPE Context on the PPE;89
8.3.16;3.16 Conclusion;89
8.3.17;3.17 What we have not discussed;90
8.4;Chapter 4 Element Interconnect Bus;91
8.4.1;4.1 Introduction to the Element Interconnect Bus;91
8.4.2;4.2 Important things to remember about the EIB;95
8.4.3;4.3 Conclusions;96
8.5;Chapter 5 Direct Memory Access (DMA);97
8.5.1;5.1 Introduction to DMA in the Cell Broadband Engine;97
8.5.2;5.2 Conclusions;101
9;Part II Programming the Cell Broadband Engine;102
9.1;Chapter 6 Foundations for Program Development on CBE;104
9.1.1;6.1 Theory of Parallel Programming;104
9.1.2;6.2 The Class NC of complexity analysis;106
9.1.3;6.3 Parallel programming concepts;107
9.1.4;6.4 Introduction to pthreads;108
9.1.5;6.5 Multi-faceted parallelism in the CBE Architecture;113
9.1.6;6.6 Alignment using compiler attributes;113
9.1.7;6.7 OpenMP;115
9.1.8;6.8 Double buffering example;117
9.1.9;6.9 Software Managed Cache;117
9.1.10;6.10 Physical Organization;118
9.1.11;6.11 Things to watch out for;118
9.1.12;6.12 Strategies and Paradigms for Identifying Parallelism;120
9.1.13;6.13 Conclusion;121
9.2;Chapter 7 The development environment;122
9.2.1;7.1 How to setup an PS3 for Cell development;122
9.2.2;7.2 Alternatives to installing GNU/Linux on the PS3;123
9.2.3;7.3 The CESOF Format;125
9.2.4;7.4 Basic development environment;127
9.2.5;7.5 Debugging code on the SPU;128
9.2.6;7.6 Conclusions;130
9.3;Chapter 8 Hello World;131
9.3.1;8.1 Writing simple programs on the Cell Broadband Engine;131
9.3.2;8.2 SPE Program Development;134
9.3.3;8.3 Using pthread with SPU tasks;136
9.3.4;8.4 Developing applications using libspe2;137
9.3.5;8.5 Measuring Performance using High Resolution Timers;149
9.3.6;8.6 Synchronization;151
9.3.7;8.7 Conclusions;159
9.4;Chapter 9 An Overview of the SDK;160
9.4.1;9.1 Major components of the Cell Broadband Engine SDK 3.0;160
9.4.2;9.2 Application programming libraries in the SDK;161
9.4.3;9.3 SPE Runtime Library;161
9.4.4;9.4 SPU Timer Library;162
9.4.5;9.5 Monte-Carlo Library;162
9.4.6;9.6 BLAS Library;163
9.4.7;9.7 LAPACK Library;164
9.4.8;9.8 SIMD Math API and Library;165
9.4.9;9.9 SPU Software Managed Cache;165
9.4.10;9.10 FFT Library;165
9.4.11;9.11 Game Math Library;165
9.4.12;9.12 Image Library;166
9.4.13;9.13 Large Matrix Library;166
9.4.14;9.14 Matrix Library;166
9.4.15;9.15 Synchronization Library;167
9.4.16;9.16 Vector Library;168
9.4.17;9.17 Multiprecision Math Library;168
9.4.18;9.18 ALF Library and Framework;169
9.4.19;9.19 DACS Library and Framework;170
9.4.20;9.20 Conclusions;170
9.5;Chapter 10 Basic Algorithms;171
9.5.1;10.1 Some number theoretic problems;171
9.5.2;10.2 Merit Factor of Sequences and Auto-correlation;174
9.5.3;10.3 Computing Greatest Common Divisor (GCD);181
9.5.4;10.4 Sorting Algorithms;183
9.5.5;10.5 Table lookup;189
9.5.6;10.6 (Perfect) Hash Function;190
9.5.7;10.7 Minimal Perfect Hashing;192
9.5.8;10.8 Implementing minimal perfect hash in the SPU;194
9.5.9;10.9 Using perfect hash functions with large graph clusters;195
9.5.10;10.10 Heap Data Structure;195
9.5.11;10.11 Disjoint Set Union-Find Algorithms;196
9.5.12;10.12 Computing Zech logarithm using SPE in batch mode;198
9.5.13;10.13 Conclusions;201
9.6;Chapter 11 Graph Theory on the CBEA;202
9.6.1;11.1 Introduction;202
9.6.2;11.2 Definitions;202
9.6.3;11.3 Graph representation;203
9.6.4;11.4 Minimum Spanning Tree using Kruskal’s Method;206
9.6.5;11.5 Revisiting min-cut using MST and Karger’s Method;213
9.6.6;11.6 Dijkstra’s Algorithms for Shortest Paths;214
9.6.7;11.7 Floyd-Warshall All-pairs shortest-path;217
9.6.8;11.8 Formal definition of APSP (all-pairs shortest-path);217
9.6.9;11.9 Maximal Independent Set;219
9.6.10;11.10 Approximation algorithm for maximal matching;219
9.6.11;11.11 Partitioning using Simulated Annealing;222
9.6.12;11.12 Simulated Annealing;225
9.6.13;11.13 Partitioning using Spectral Methods;232
9.6.14;11.14 On-line streaming algorithm for graph properties;237
9.6.15;11.15 Conclusion;246
9.7;Chapter 12 Alternative methods for parallel programming on SPE;247
9.7.1;12.1 Introduction;247
9.7.2;12.2 NESL: Nested Data Parallel Language;247
9.7.3;12.3 CVL and VCODE: Vector Libraries;250
9.7.4;12.4 A Simplified SPU ISA Simulator;252
9.7.5;12.5 VCODE Compiler;259
9.7.6;12.6 PowerList: Data structure for parallel programming;261
9.7.7;12.8 Tale of Dwarfs;265
9.7.8;12.9 Introduction to Cilk++;266
9.7.9;12.10 Introduction to Intel Ct;267
9.7.10;12.11 Introduction to RapidMind;267
9.8;Chapter 13 Computational Mathematics on the CBEA;269
9.8.1;13.1 Introduction;269
9.8.2;13.2 Calculating PI;269
9.8.3;13.3 Line fitting;271
9.8.4;13.4 Representing Monomials and Polynomials;275
9.8.5;13.5 Evaluating polynomials;279
9.8.6;13.6 SIMD Polynomial evaluation with fixed step;280
9.8.7;13.7 Parallel Polynomial Evaluation;280
9.8.8;13.8 Evaluating Boolean Functions;283
9.8.9;13.9 Parallel Matrix Functions;283
9.8.10;13.10 Solving Equations in a Single Complex Variable;283
9.8.11;13.11 Conclusions;309
9.9;Chapter 14 Vector Graphics on SPU;310
9.9.1;14.1 Introduction;310
9.9.2;14.2 Basic Computational Geometry on SPU;310
9.9.3;14.3 Conclusion;322
9.10;Chapter 15 Optimizing SPU Programs;323
9.10.1;15.1 Introduction to Program Optimization for Cell;323
9.10.2;15.2 GNU gprof;323
9.10.3;15.3 Branch probability extraction;324
9.10.4;15.4 SPU Assembly Analysis;325
9.10.5;15.5 FDPRPRO: Optimization Tool;328
9.10.6;15.6 Conclusion;330
10;Part III Case Studies;331
10.1;Chapter 16 Line-of-sight Computation;333
10.1.1;16.1 Introduction to the line-of-sight computation;333
10.1.2;16.2 Basic equation for LOS calculation;336
10.1.3;16.3 Watershed analysis using LOS;338
10.1.4;16.4 LOS formulation using Bresenham’s line-drawing for interpolation;338
10.1.5;16.5 SIMD Implementation of line-of-sight algorithm;341
10.1.6;16.6 Watershed Analysis;342
10.1.7;16.7 SPU Implementation;343
10.1.8;16.8 Conclusion;349
10.2;Chapter 17 Structure Determination using PDF;350
10.2.1;17.1 Problem of nano-structure determination;350
10.2.2;17.2 Implementation of the ab-initio PDF method;351
10.2.3;17.3 Algorithms and data structures required for the problem;353
10.2.4;17.5 Conclusions;368
10.3;Chapter 18 Polytope Enumeration;369
10.3.1;18.1 What is a polytope ?;369
10.3.2;18.2 Examples of polytopes;372
10.3.3;18.4 Experimental Results;383
10.3.4;18.5 Future Work;385
10.3.5;18.6 Current Results;385
10.3.6;18.7 Conclusion;386
10.4;Chapter 19 Synthetic Aperture Radar;387
10.4.1;19.1 Introduction;387
10.4.2;19.2 Fundamental principal behind Synthetic Aperture Radar;388
10.4.3;19.3 Problem Statement for SAR Image Processing;389
10.4.4;19.4 Block Adaptive Quantization;390
10.4.5;19.5 Signal processing functions for SAR;393
10.4.6;19.6 Implementation Details;394
10.4.7;19.7 Experimental Results;397
10.4.8;19.8 Conclusions;399
10.5;Chapter 20 VLSI Design Automation;400
10.5.1;20.1 CAD of Very Large Scale Integrated Circuits;400
10.5.2;20.2 Algorithmic Design and HDL Capture;401
10.5.3;20.3 Equation Processing;408
10.5.4;20.4 Microword Optimization using Local Search on SPU;414
10.5.5;20.5 Conclusion;418
10.6;Chapter 21 Floorplanning: VLSI and other Applications;419
10.6.1;21.1 What is a VLSI floorplan;419
10.6.2;21.2 The sequence-pair data-structure;419
10.6.3;21.3 Use of a floorplan in non-VLSI applications;421
10.6.4;21.4 Implementation of a Sequence-pair based Floorplanner;422
10.6.5;21.5 Conclusions;425
10.7;Chapter 22 VLSI Placement;426
10.7.1;22.1 Introduction to VLSI Standard Cell Placement;426
10.7.2;22.2 Reading gate-level BLIF files and making a graph;429
10.7.3;22.3 SPU basic cell placer, using Simulated Annealing;429
10.7.4;22.4 What is the VLSI global routing problem;451
10.7.5;22.5 Conclusion;453
10.8;Chapter 23 Power Estimation for VLSI;454
10.8.1;23.1 Introduction;454
10.8.2;23.2 Static Probabilities;455
10.8.3;23.3 Spatial Correlations;457
10.8.4;23.4 A distributed algorithm for signal probability estimation for combinational circuits;458
10.8.5;23.5 Bit-vector implementation of probability polynomials;461
10.8.6;23.6 Applications of PRBC;464
10.8.7;23.7 A Nested Data-Parallel Implementation of the Parker-McCluskey Method;465
10.8.8;23.8 Analysis of our method;467
10.8.9;23.9 PRBC Implementation;468
10.8.10;23.10 Probability Polynomial Evaluation on SPU;475
10.8.11;23.11 Experiments;476
10.8.12;23.12 Conclusion;477
10.9;Chapter 24 Concluding Remarks;478
10.9.1;24.1 Progress in small steps;478
10.9.2;24.2 Use of Common Lisp;478
10.9.3;24.3 Future Work;479
10.9.4;24.4 Conclusions;481
10.9.5;24.5 Website Information;481
11;Appendix A Introduction to Common Lisp;482
11.1;A.1 Introduction to Common Lisp;482
11.2;A.2 Functions in Common Lisp;483
11.3;A.3 File I/O;484
11.4;A.4 Conclusion;484
12;Index;485
13;References;496




