E-Book, Englisch, Band Volume 92, 288 Seiten
Reihe: Advances in Computers
Namasudra Advances in Computers
1. Auflage 2014
ISBN: 978-0-12-799933-3
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
E-Book, Englisch, Band Volume 92, 288 Seiten
Reihe: Advances in Computers
ISBN: 978-0-12-799933-3
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
Since its first volume in 1960, Advances in Computers has presented detailed coverage of innovations in computer hardware, software, theory, design, and applications. It has also provided contributors with a medium in which they can explore their subjects in greater depth and breadth than journal articles usually allow. As a result, many articles have become standard references that continue to be of significant, lasting value in this rapidly expanding field. - In-depth surveys and tutorials on new computer technology - Well-known authors and researchers in the field - Extensive bibliographies with most chapters - Many of the volumes are devoted to single themes or subfields of computer science
Autoren/Hrsg.
Weitere Infos & Material
Survey on System I/O Hardware Transactions and Impact on Latency, Throughput, and Other Factors
Steen Larsen*,† and Ben Lee*, *School of Electrical and Engineering Computer Science, Oregon State University, Corvallis, Oregon, USA, †Intel Corporation, Hillsboro, Oregon, USA
Abstract
Computer system input/output (I/O) has evolved with processor and memory technologies in terms of reducing latency, increasing bandwidth, and other factors. As requirements increase for I/O, such as networking, storage, and video, descriptor-based direct memory access (DMA) transactions have become more important in high-performance systems to move data between I/O adapters and system memory buffers. DMA transactions are done with hardware engines below the software protocol abstraction layers in all systems other than rudimentary embedded controllers. Central processing unit (CPUs) can switch to other tasks by offloading hardware DMA transfers to the I/O adapters. Each I/O interface has one or more separately instantiated descriptor-based DMA engines optimized for a given I/O port. I/O transactions are optimized by accelerator functions to reduce latency, improve throughput, and reduce CPU overhead. This chapter surveys the current state of high-performance I/O architecture advances and explores benefits and limitations. With the proliferation of CPU multicores within a system, multi-GB/s ports, and on-die integration of system functions, changes beyond the techniques surveyed may be needed for optimal I/O architecture performance.
Keywords
Input/output; Processors; Controllers; Memory; DMA; Latency; Throughput; Power
Abbreviations
ARM Acorn RISC Machine
BIOS basic input/output system—allows access by the operating system to low-level hardware
BW bandwidth supported by an interface, usually synonymous with throughput capability
CNI coherent network interface
CPU central processing unit—consisting of potentially multiple cores, each with one or more hardware threads of execution
CRC cyclic redundancy check
CQE completion queue entry—used in RDMA to track transaction completions
DCA direct cache access
DDR double data rate—allows a slower clock to transmit twice the data per cycle. Usually based on both the rising and falling edge of a clock signal
DDR3 3rd generation memory DDR interface
DLP data layer protocol in PCIe, which is similar to networking IP layer
DMA direct memory access—allows read or write transactions with system memory
DSP digital signal processing
FPGA field-programmable gate array
FSB front-side bus—a processor interface protocol that is replaced by Intel QPI and AMD HyperTransport
GbE gigabit Ethernet
GBps gigabytes per second
Gbps gigabits per second (GBps x8)
GHz gigahertz
GOQ global observation queue
GPU graphic processing unit
HPC high-performance computing—usually implies a high-speed interconnection of high-performance systems
HW hardware
ICH Intel I/O controller hub—interfaced to the IOH to support slower system protocols, such as USB and BIOS memory
I/O input/output
IOH Intel I/O hub—interfaces between QPI and PCIe interfaces
iWARP Internet wide area RDMA protocol—an RDMA protocol that supports lower level Ethernet protocol transactions
kB kilobyte, 1024 bytes. Sometimes reduced to “K” based on context
L1 cache level 1 cache
L2 cache level 2 cache
LCD liquid crystal display
LLC last-level cache—level 3 cache
LLI low latency interrupt
LLP link layer protocol—used PCIe
LRO large receive offloading
LSO large segment offload
MB megabytes
MESI(F) modified, exclusive, shared, invalid, and optionally forward—protocol to maintain memory coherency between different CPUs in a system
MFC memory flow controller—used to manage SPU DMA transactions
MMIO memory-mapped I/O
MPI message passing interface—a protocol to pass messages between systems often used in HPC
MSI message signaled interrupt—used in PCIe to interrupt a core
MTU maximum transmission unit
NIC network interface controller
NUMA nonuniform memory architecture—allows multiple pools of memory to be shared between CPUs with a coherency protocol
PCIe Peripheral Component Interconnect express—defined at www.pcisig.com. Multiple lanes (1–16) of serial I/O traffic reaching 16 Gbps per lane. Multiple generations of PCIe exist, represented by Gen1, Gen2, Gen3, and Gen4. PCIe protocol levels have similarities with networking ISO stack
PHY PHYsical interface defining the cable (fiber/copper) interfacing protocol
PIO programmed I/O—often synonymous with MMIO
QDR quad data rate—allows four times the data rate based on a slower clock frequency
QoS quality of service—a metric to define guaranteed minimums of service quality
QP queue pair—transmit queue and receive queue structure in RDMA to allow interfacing between two or more systems
QPI QuickPath Interconnect—Intel's proprietary CPU interface supporting MESI(F) memory coherence protocol
RAID redundant array of independent disks
RDMA remote direct memory access—used to access memory between two or more systems
RSS receive side scaling
RTOS real-time operating system
RX reception from a network to a system
SAS storage array system
SCC single-chip cloud
SCSI small computer system interface
SMT simultaneous multithreading
SPE synergistic processing element in the cell processor
SPU synergistic processing unit in cell SPE
SSD solid-stated disk
SW software
TCP/IP transmission control protocol and Internet protocol networking stack
TLP transaction layer protocol of PCIe stack
TOE TCP/IP offload engine
TX transmission from a system to a network
USB universal serial bus
WQE work queue entry—used in RDMA to track transaction parameters
1 Introduction
Input/output (I/O) is becoming a peer to processor core (or simply ) and memory in terms of latency, bandwidth, and power requirements. Historically, when a core was simpler and more directly I/O focused, it was acceptable to “bit-bang” I/O port operations using port I/O or memory-mapped I/O (MMIO) models [1]. However, with complex user interfaces and programs using multiple processes, the benefit of offloading data movement to an I/O adapter became more apparent. Since I/O devices are much slower than the core–memory bandwidth, it makes sense to move data at a pace governed by the external device.
Typically, I/O data transfer is initiated using a descriptor containing the physical address and size of the data to be moved. This descriptor is then posted (i.e., sent) to the I/O adapter, which then processes the direct memory access (DMA) read/write operations as fast as the core–memory bandwidth allows. The descriptor-based DMA approach makes sense when the I/O bandwidth requirements are much lower than the...




