E-Book, Englisch, 296 Seiten
Gogniat / Milojevic / Morawiec Algorithm-Architecture Matching for Signal and Image Processing
1. Auflage 2010
ISBN: 978-90-481-9965-5
Verlag: Springer-Verlag
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Best papers from Design and Architectures for Signal and Image Processing 2007 & 2008 & 2009
E-Book, Englisch, 296 Seiten
ISBN: 978-90-481-9965-5
Verlag: Springer-Verlag
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Advances in signal and image processing together with increasing computing power are bringing mobile technology closer to applications in a variety of domains like automotive, health, telecommunication, multimedia, entertainment and many others. The development of these leading applications, involving a large diversity of algorithms (e.g. signal, image, video, 3D, communication, cryptography) is classically divided into three consecutive steps: a theoretical study of the algorithms, a study of the target architecture, and finally the implementation. Such a linear design flow is reaching its limits due to intense pressure on design cycle and strict performance constraints. The approach, called Algorithm-Architecture Matching, aims to leverage design flows with a simultaneous study of both algorithmic and architectural issues, taking into account multiple design constraints, as well as algorithm and architecture optimizations, that couldn't be achieved otherwise if considered separately. Introducing new design methodologies is mandatory when facing the new emerging applications as for example advanced mobile communication or graphics using sub-micron manufacturing technologies or 3D-Integrated Circuits. This diversity forms a driving force for the future evolutions of embedded system designs methodologies.The main expectations from system designers' point of view are related to methods, tools and architectures supporting application complexity and design cycle reduction. Advanced optimizations are essential to meet design constraints and to enable a wide acceptance of these new technologies.Algorithm-Architecture Matching for Signal and Image Processing presents a collection of selected contributions from both industry and academia, addressing different aspects of Algorithm-Architecture Matching approach ranging from sensors to architectures design. The scope of this book reflects the diversity of potential algorithms, including signal, communication, image, video, 3D-Graphics implemented onto various architectures from FPGA to multiprocessor systems. Several synthesis and resource management techniques leveraging design optimizations are also described and applied to numerous algorithms.Algorithm-Architecture Matching for Signal and Image Processing should be on each designer's and EDA tool developer's shelf, as well as on those with an interest in digital system design optimizations dealing with advanced algorithms.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;5
2;Contents;7
3;Contributors;9
4;Architectures for Embedded Applications;12
4.1;Lossless Multi-Mode Interband Image Compression and Its Hardware Architecture;13
4.1.1;Introduction;13
4.1.2;An Overview of LMMIC;15
4.1.3;Multi-Mode Strategy;17
4.1.3.1;Preprocessing;17
4.1.3.2;Run-Mode;17
4.1.3.3;Ternary-Mode;18
4.1.3.4;Regular-Mode;19
4.1.4;Band Shifting and Gradient-Based Switching;19
4.1.4.1;Band Shifting for Inter-Band Prediction;20
4.1.4.2;Gradient-Based Switching;21
4.1.4.3;Adaptation in Run-Mode and Ternary-Mode;22
4.1.5;Context Modelling;23
4.1.6;Performance Comparison;23
4.1.7;Hardware Architecture;26
4.1.7.1;Lossless Image Modelling;26
4.1.7.2;Probability Estimator and Arithmetic Coding;28
4.1.7.2.1;Arithmetic Coding;29
4.1.7.2.2;Probability Estimation;29
4.1.7.2.2.1;Overview;29
4.1.7.2.2.2;Working Mechanism of the Context Trees;29
4.1.7.2.2.3;Context Tree Initialization;31
4.1.7.2.2.4;Choice of Context Tree Node Size;32
4.1.7.2.2.5;Output of Probability Estimation;32
4.1.7.2.2.6;Architecture of Probability Estimator;33
4.1.8;Conclusions;35
4.1.9;References;35
4.2;Efficient Memory Management for Uniform and Recursive Grid Traversal;37
4.2.1;Introduction;37
4.2.2;State of the Art;39
4.2.2.1;Dataset Traversal;39
4.2.2.2;Memory Management;40
4.2.3;System Architecture;42
4.2.4;The nD-AP Cache;42
4.2.5;Uniform Grids;43
4.2.5.1;Uniform Grid Traversal;43
4.2.5.2;Uniform Grid Caching;45
4.2.6;Recursive Grids;46
4.2.6.1;Caching the RG Data Structure;46
4.2.6.1.1;Recursive Grids;46
4.2.6.1.2;RG Cache;47
4.2.6.1.3;Improving Reference Locality;47
4.2.6.2;Recursive Grid Traversal;49
4.2.6.2.1;Neighbour Finding Unit;49
4.2.6.2.2;Phase-Locked Ray Beam Propagation;50
4.2.7;Results;52
4.2.7.1;Hardware Complexity;53
4.2.7.1.1;Uniform Grid Traversal;53
4.2.7.1.2;Hierarchical Grid Traversal;54
4.2.7.2;Cache Efficiency;54
4.2.7.2.1;Cache Efficiency of the Uniform Grid Traversal;55
4.2.7.2.1.1;Visualization;55
4.2.7.2.1.2;Sinogram Computing;56
4.2.7.3;Cache Efficiency of the Recursive Grid Traversal;57
4.2.7.4;Discussion;58
4.2.7.5;Improvements;59
4.2.8;Conclusion;59
4.2.9;References;60
4.3;Mapping a Telecommunication Application on a Multiprocessor System-on-Chip;62
4.3.1;Introduction;62
4.3.2;Related Work;63
4.3.3;Application Specification;64
4.3.4;The Target Hardware Architecture;66
4.3.4.1;The Telecommunication Platform;67
4.3.5;The Classification Application;69
4.3.5.1;The Application Task Graph;69
4.3.6;DSX Design Space Explorer;71
4.3.6.1;DSX Architecture Description;71
4.3.6.2;DSX Application Description;72
4.3.6.3;DSX I/O Coprocessor Description;72
4.3.6.4;Classification and Scheduling Tasks;73
4.3.6.5;Bootstrap Task;74
4.3.6.6;DSX Mapping Description;75
4.3.7;Eliminating the Bottlenecks;77
4.3.7.1;Accesses to the InputChannels;78
4.3.7.2;Simultaneous Accesses to Memory Banks;79
4.3.7.3;Burst Size;80
4.3.8;Performance Results;81
4.3.9;Conclusion and Perspectives;83
4.3.10;References;84
5;Data Acquisition and Embedded Systems;87
5.1;A Standard 3.5T CMOS Imager Including a Light Adaptive System for Integration Time Optimization;88
5.1.1;Introduction;88
5.1.2;Automatic control of the integration time value;91
5.1.3;Architecture of the Sensor;93
5.1.4;Overview and Measures of Our Circuit;95
5.1.5;Discussion;97
5.1.6;Conclusions and Perspectives;97
5.1.7;References;99
5.2;Approximate Multiplication and Division for Arithmetic Data Value Speculation in a RISC Processor;101
5.2.1;Introduction;102
5.2.1.1;Contributions;102
5.2.1.2;Overview;103
5.2.2;Background;103
5.2.2.1;Approximate Arithmetic;103
5.2.2.2;Arithmetic Data Value Speculation;103
5.2.3;Simulation and Synthesis Tools;104
5.2.3.1;SimpleScalar;104
5.2.3.2;MediaBench;105
5.2.3.3;Operand Caches;105
5.2.3.4;Logic Synthesis;106
5.2.4;Approximate Multiplication;106
5.2.4.1;Counters;106
5.2.4.2;Multiplier Topology;106
5.2.4.3;Multiplier Results;107
5.2.5;Approximate Unsigned Division;111
5.2.5.1;Division Algorithm;112
5.2.5.2;Divider Implementation;112
5.2.5.3;Divider Results;114
5.2.6;Simulation of a RISC Processor with ADVS;116
5.2.6.1;Operand Cache Simulation;116
5.2.6.2;SimpleScalar Simulation;117
5.2.7;Conclusions;120
5.2.8;References;121
5.3;RANN: A Reconfigurable Artificial Neural Network Model for Task Scheduling on Reconfigurable System-on-Chip;123
5.3.1;Introduction;123
5.3.2;Problem Definition;124
5.3.3;Related Works;126
5.3.3.1;Temporal and Spatial Task Scheduling;126
5.3.3.2;ANNs Models for Task Scheduling;127
5.3.3.3;Implementation of ANNs;130
5.3.4;Scheduling for Reconfigurable Hardware using ANN;131
5.3.4.1;Management of an Unfixed Number of Tasks Within the Reconfigurable Unit;131
5.3.4.2;Management of Task Dependencies;133
5.3.4.3;Example of an RANN Structure;135
5.3.5;Discussion;136
5.3.6;Convergence Case Study;138
5.3.7;Implementation Results of the RANN;142
5.3.8;Execution Performance Comparisons;146
5.3.9;Conclusion;146
5.3.10;References;148
5.4;A New Three-Level Strategy for Off-Line Placement of Hardware Tasks on Partially and Dynamically Reconfigurable Hardware;151
5.4.1;Introduction;151
5.4.2;Related Work;152
5.4.3;Level 1: Off-Line Flow of Hardware Task Classification;154
5.4.3.1;Flow Terminology;154
5.4.3.2;Application Level;154
5.4.3.3;Physical Level;155
5.4.3.4;Flow Steps;157
5.4.3.4.1;Step 1: RZ Types Search or Hardware Task Classes Search;157
5.4.3.4.2;Step 2: Hardware Task Classification;159
5.4.3.4.3;Step 3: Decision of Increasing the Number of RZs;160
5.4.4;Level 2: RPBs Partitioning on the Target Device;160
5.4.5;Level 3: Two-Level Fitting;162
5.4.6;Modeling of Placement Problem;162
5.4.7;Exhaustive Complete Resolution of Placement Problem;166
5.4.8;Non-Exhaustive Complete Resolution of Placement Problem;167
5.4.9;Application and Results;170
5.4.10;Conclusion;174
5.4.11;References;175
5.5;End-to-End Bitstreams Repository Hierarchy for FPGA Partially Reconfigurable Systems;176
5.5.1;Introduction;176
5.5.2;Hierarchy Level L1;179
5.5.2.1;Cache Architecture;181
5.5.2.2;Hardware Architecture;181
5.5.2.3;Results;183
5.5.3;Hierarchy Level L2;183
5.5.3.1;Data Link over Ethernet 100 Mb/s;184
5.5.3.2;Error Rates;185
5.5.3.3;Hardware Architecture;186
5.5.3.4;Software Achitecture;187
5.5.3.5;Results;189
5.5.4;Hierarchy Level L3;190
5.5.4.1;Common Used Transport Protocols;190
5.5.4.2;TCP/IP Architecture Model;191
5.5.4.3;Software Architecture;192
5.5.4.3.1;lwIP as a TCP/IP Networking Stack;192
5.5.4.3.2;Software DPR Protocol;193
5.5.4.4;Hardware Architecture;194
5.5.4.5;Results;195
5.5.5;Conclusion and Perspectives;197
5.5.6;References;198
6;Embedded Systems Design;200
6.1;SystemC Multiprocessor RTOS Model for Services Distribution on MPSoC Platforms;201
6.1.1;Introduction;201
6.1.2;Related Work;202
6.1.3;RTOS Modeling;203
6.1.4;MPSoC Modeling and RTOS Distribution;206
6.1.4.1;Distant Communications and Services Requests;206
6.1.4.2;CAS Model;207
6.1.5;A Tool for Specific OS Definition;209
6.1.5.1;Goal of the Tool;209
6.1.5.2;Presentation of the DOGME Tool;209
6.1.6;Experiments and Results;212
6.1.6.1;A Robotic Vision System;212
6.1.6.2;Deployment Exploration;213
6.1.6.3;Results;215
6.1.7;Conclusion;217
6.1.8;References;218
6.2;A List Scheduling Heuristic with New Node Priorities and Critical Child Technique for Task Scheduling with Communication Contention;220
6.2.1;Introduction;220
6.2.2;Models and Definitions;222
6.2.2.1;DAG Model;222
6.2.2.2;Topology Graph Model;223
6.2.2.3;Task Scheduling with Communication Contention;224
6.2.3;Node Levels with Communication Contention;226
6.2.3.1;Existing Node Levels;226
6.2.3.2;New Node Levels;227
6.2.4;List Scheduling Heuristic;229
6.2.4.1;Sorting Nodes with Five Groups of Node Priorities;229
6.2.4.2;Processor Selection;229
6.2.4.3;Node and Edge Scheduling;231
6.2.5;Analysis of Time Complexity;232
6.2.6;Experimental Results;233
6.2.6.1;Comparison with an Example;233
6.2.6.2;Comparison with Random DAGs;234
6.2.6.3;Time Complexity;236
6.2.7;Conclusions and Prospects;237
6.2.8;References;238
6.3;Multiprocessor Scheduling of Dataflow Programs within the Reconfigurable Video Coding Framework;240
6.3.1;Introduction;240
6.3.2;Concepts of the Reconfigurable Video Coding Framework;241
6.3.2.1;The CAL Language;242
6.3.3;The Scheduling Approach;244
6.3.4;Case Study: MPEG-4 SP Decoder;245
6.3.4.1;Design Space Exploration;247
6.3.4.2;The Results;251
6.3.5;Conclusion;253
6.3.6;References;254
6.4;A High Level Synthesis Flow Using Model Driven Engineering;255
6.4.1;Introduction;255
6.4.1.1;Design Challenges;256
6.4.1.1.1;HLS Tool User;256
6.4.1.1.2;HLS Tool Designer;257
6.4.1.2;Proposed HLS Flow;257
6.4.2;Related Works;258
6.4.3;Model Driven Engineering;260
6.4.3.1;Model and Metamodel;261
6.4.3.2;Model Transformations;261
6.4.4;High Level Specification Models;263
6.4.4.1;UML Model;263
6.4.4.2;ISP Model and UML2ISP;265
6.4.5;Implementation at a Low Level;266
6.4.5.1;RTL Model;266
6.4.5.2;ISP 2RTL Transformation;267
6.4.5.3;RTL2VHDL Transformation;270
6.4.6;Case Study;270
6.4.6.1;UML Model;271
6.4.6.2;Generated Hardware Accelerator;272
6.4.7;Conclusion;274
6.4.8;References;274
6.5;Generation of Hardware/Software Systems Based on CAL Dataflow Description;277
6.5.1;Introduction;277
6.5.2;Objectives and principles;279
6.5.2.1;CAL Actor Language;280
6.5.2.2;Objectives: Unified Specification Formalism;281
6.5.2.3;The Global Interfaces Methodology;282
6.5.3;Effectiveness of CAL2C and CAL2HDL;283
6.5.3.1;First Design Case: MPEG-4 SP Decoder;283
6.5.3.2;Second Design Case: the Code Bar Decoder;285
6.5.4;Interfaces Driver Generation for Implementation;286
6.5.4.1;Driver Architecture Overview;286
6.5.4.2;Serialization and Deserialization Process;288
6.5.4.2.1;Comparison of the Efficiency;288
6.5.4.2.2;Comparison of the Hardware Implementation;289
6.5.4.2.3;Algorithm Synthesis;290
6.5.5;Design Cases with Interfaces Driver Generation;290
6.5.5.1;Ethernet Link;291
6.5.5.1.1;Ethernet on Cyclone II;291
6.5.5.1.2;Ethenet Link on Virtex 5;292
6.5.5.2;PCI Link;292
6.5.6;Conclusion;292
6.5.7;References;293
7;Index;295




