Ljubuncic | Problem-solving in High Performance Computing | E-Book | sack.de
E-Book

E-Book, Englisch, 320 Seiten

Ljubuncic Problem-solving in High Performance Computing

A Situational Awareness Approach with Linux

E-Book, Englisch, 320 Seiten

ISBN: 978-0-12-801064-8
Verlag: Elsevier Reference Monographs
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)



Problem-Solving in High Performance Computing: A Situational Awareness Approach with Linux focuses on understanding giant computing grids as cohesive systems. Unlike other titles on general problem-solving or system administration, this book offers a cohesive approach to complex, layered environments, highlighting the difference between standalone system troubleshooting and complex problem-solving in large, mission critical environments, and addressing the pitfalls of information overload, micro, and macro symptoms, also including methods for managing problems in large computing ecosystems. The authors offer perspective gained from years of developing Intel-based systems that lead the industry in the number of hosts, software tools, and licenses used in chip design. The book offers unique, real-life examples that emphasize the magnitude and operational complexity of high performance computer systems.
Provides insider perspectives on challenges in high performance environments with thousands of servers, millions of cores, distributed data centers, and petabytes of shared dataCovers analysis, troubleshooting, and system optimization, from initial diagnostics to deep dives into kernel crash dumpsPresents macro principles that appeal to a wide range of users and various real-life, complex problemsIncludes examples from 24/7 mission-critical environments with specific HPC operational constraints

Igor Ljubuncic is a Linux Lead Engineer with Rackspace, the #1 managed cloud company. Previously, Igor has worked as an OS architect within Intel's IT Engineering Computing business group, exploring and developing solutions for a large, global high-performance Linux environment that supports Intel's chip design. Igor has eleven years of experience in the hi-tech industry, first as a physicist and lately in various engineering roles, with a strong focus on data-driven methodologies. To date, Igor has had fourteen patents accepted for filing with the US PTO, emphasizing on data center technologies, scheduling, and Internet of Things. He has authored several open-source projects and technical books, numerous articles accepted for publication in leading technical journals and magazines, and presented at prestigious international conferences, like LinuxCon. In his free time, Igor writes car reviews, fantasy books and manages his Linux-oriented blog, dedoimedo.com, which garners close to a million views from loyal readers every month.
Ljubuncic Problem-solving in High Performance Computing jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


Introduction: data center and high-end computing
Data center at a glance
If you are looking for a pitch, a one-liner for how to define data centers, then you might as well call them the modern power plants. They are the equivalent of the old, sooty coal factories that used to give the young, enterpreneurial industrialist of the mid 1800s the advantage he needed over the local tradesmen in villages. The plants and their laborers were the unsung heroes of their age, doing their hard labor in the background, unseen, unheard, and yet the backbone of the revolution that swept the world in the nineteenth century. Fast-forward 150 years, and a similar revolution is happening. The world is transforming from an analog one to a digital, with all the associated difficulties, buzz, and real technological challenges. In the middle of it, there is the data center, the powerhouse of the Internet, the heart of the search, the big in the big data. Modern data center layout
Realistically, if we were to go into specifics of the data center design and all the underlying pieces, we would need half a dozen books to write it all down. Furthermore, since this is only an introduction, an appetizer, we will only briefly touch this world. In essence, it comes down to three major components: network, compute, and storage. There are miles and miles of wires, thousands of hard disks, angry CPUs running at full speed, serving the requests of billions every second. But on their own, these three pillars do not make a data center. There is more. If you want an analogy, think of an aircraft carrier. The first thing that comes to mind is Tom Cruise taking off in his F-14, with Kenny Loggins’ Danger Zone playing in the background. It is almost too easy to ignore the fact there are thousands of aviation crew mechanics, technicians, electricians, and other specialists supporting the operation. It is almost too easy to forget the floor upon floor of infrastructure and workshops, and in the very heart of it, an IT center, carefully orchestrating the entire piece. Data centers are somewhat similar to the 100,000-ton marvels patrolling the oceans. They have their components, but they all need to communicate and work together. This is why when you talk about data centers, concepts such as cooling and power density are just as critical as the type of processor and disk one might use. Remote management, facility security, disaster recovery, backup – all of these are hardly on the list, but the higher you scale, the more important they become. Welcome to the borg, resistance is futile
In the last several years, we see a trend moving from any old setup that includes computing components into something approaching standards. Like any technology, the data center has reached a point at which it can no longer sustain itself on its own, and the world cannot tolerate a hundred different versions of it. Similar to the convergence of other technologies, such as network protocols, browser standards, and to some extent, media standards, the data center as a whole is also becoming a standard. For instance, the Open Data Center Alliance (ODCA) (Open Data Center Alliance, n.d.) is a consortium established in 2010, driving adoption of interoperable solutions and services – standards – across the industry. In this reality, hanging on to your custom workshop is like swimming against the current. Sooner or later, either you or the river will have to give up. Having a data center is no longer enough. And this is part of the reason for this book – solving problems and creating solutions in a large, unique high-performance setup that is the inevitable future of data centers. Powers that be
Before we dig into any tactical problem, we need to discuss strategy. Working with a single computer at home is nothing like doing the same kind of work in a data center. And while the technology is pretty much identical, all the considerations you have used before – and your instincts – are completely wrong. High-performance computing starts and ends with scale, the ability to grow at a steady rate in a sustainable manner without increasing your costs exponentially. This has always been a challenging task, and quite often, companies have to sacrifice growth once their business explodes beyond control. It is often the small, neglected things that force the slowdown – power, physical space, the considerations that are not often immediate or visible. Enterprise versus Linux
Another challenge that we are facing is the transition from the traditional world of the classic enterprise into the quick, rapid-paced, ever-changing cloud. Again, it is not about technology. It is about people who have been in the IT business for many years, and they are experiencing this sudden change right before their eyes. The classic office
Enabling the office worker to use their software, communicate with colleagues and partners, send email, and chat has been a critical piece of the Internet since its earlier days. But, the office is a stagnant, almost boring environment. The needs for change and growth are modest. Linux computing environment
The next evolutionary step in the data center business was the creation of the Linux operating system. In one fell swoop, it delivered a whole range of possibilities that were not available beforehand. It offered affordable cost compared to expensive mainframe setups. It offered reduced licensing costs, and the largely open-source nature of the product allowed people from the wider community to participate and modify the software. Most importantly, it also offered scale, from minimal setups to immense supercomputers, accommodating both ends of the spectrum with almost nonchalant ease. And while there was chaos in the world of Linux distributions, offering a variety of flavors and types that could never really catch on, the kernel remained largely standard, and allowed businesses to rely on it for their growth. Alongside opportunity, there was a great shift in the perception in the industry, and the speed of change, testing the industry’s experts to their limit. Linux cloud
Nowadays, we are seeing the third iteration in the evolution of the data center. It is shifting from being the enabler for products into a product itself. The pervasiveness of data, embodied in the concept called the Internet of Things, as well as the fact that a large portion of modern (and online) economy is driven through data search, has transformed the data center into an integral piece of business logic. The word cloud is used to describe this transformation, but it is more than just having free compute resources available somewhere in the world and accessible through a Web portal. Infrastructure has become a service (IaaS), platforms have become a service (PaaS), and applications running on top of a very complex, modular cloud stack are virtually indistinguishable from the underlying building blocks. In the heart of this new world, there is Linux, and with it, a whole new generation of challenges and problems of a different scale and problem that system administrators never had to deal with in the past. Some of the issues may be similar, but the time factor has changed dramatically. If you could once afford to run your local system investigation at your own pace, you can no longer afford to do so with cloud systems. Concepts such as uptime, availability, and price dictate a different regime of thinking and require different tools. To make things worse, speed and technical capabilities of the hardware are being pushed to the limit, as science and big data mercilessly drive the high-performance compute market. Your old skills as a troubleshooter are being put to a test. 10,000 × 1 does not equal 10,000
The main reason why a situational-awareness approach to problem solving is so important is that linear growth brings about exponential complexity. Tools that work well on individual hosts are not built for mass deployments or do not have the capability for cross-system use. Methodologies that are perfectly suited for slow-paced, local setups are utterly outclassed in the high-performance race of the modern world. Nonlinear scaling of issues
On one hand, larger environments become more complex because they simply have a much greater number of components in them. For instance, take a typical hard disk. An average device may have a mean time between failure (MTBF) of about 900 years. That sounds like a pretty safe bet, and you are more likely to decommission a disk after several years of use than see it malfunction. But if you have a thousand disks, and they are all part of a larger ecosystem, the MTBF shrinks down to about 1 year, and suddenly, problems you never had to deal with explicitly become items on the daily agenda. On the other hand, large environments also require additional considerations when it comes to power, cooling, physical layout and design of data center aisles and rack, the network interconnectivity, and the number of edge devices. Suddenly, there are new dependencies that never existed on a smaller scale, and those that did are magnified or made significant when looking at the system as a whole. The considerations you may have for problem solving change. The law of large numbers
It is almost too easy to overlook how much effect small, seemingly imperceptible changes in great quantity can have on the larger system. If you were to optimize the kernel on a single...


Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.