Robert / Herault | Fault-Tolerance Techniques for High-Performance Computing | Buch | 978-3-319-35560-3 | www.sack.de

Buch, Englisch, 320 Seiten, Previously published in hardcover, Format (B × H): 155 mm x 235 mm, Gewicht: 5037 g

Reihe: Computer Communications and Networks

Robert / Herault

Fault-Tolerance Techniques for High-Performance Computing


Softcover Nachdruck of the original 1. Auflage 2015
ISBN: 978-3-319-35560-3
Verlag: Springer International Publishing

Buch, Englisch, 320 Seiten, Previously published in hardcover, Format (B × H): 155 mm x 235 mm, Gewicht: 5037 g

Reihe: Computer Communications and Networks

ISBN: 978-3-319-35560-3
Verlag: Springer International Publishing


This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC).

The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models.

Topics and features: includes self-contained contributions from an international selection of preeminent experts; provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction; reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface; investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption.

This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing.

Robert / Herault Fault-Tolerance Techniques for High-Performance Computing jetzt bestellen!

Zielgruppe


Research

Weitere Infos & Material


Part I: General Overview

Fault-Tolerance Techniques for High-Performance Computing
Jack Dongarra, Thomas Herault and Yves Robert

Part II: Technical Contributions

Errors and Faults
Ana Gainaru and Franck Cappello

Fault-Tolerant MPI
Aurelien Bouteiller

Using Replication for Resilience on Exascale Systems
Henri Casanova, Frédéric Vivien and Dounia Zaidouni

Energy-Aware Checkpointing Strategies
Guillaume Aupy, Anne Benoit, Mohammed El Mehdi Diouri, Olivier Glück and Laurent Lefèvre



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.