Motonewstoday.com

The dirty secret of high-performance computing

The dirty secret of high-performance computing

Decades after Seymour Cray developed what is widely believed to be the world’s first supercomputer, CDC 6600 (opens in a new tab), an arms race has begun in the high-performance computing (HPC) community. The goal: to increase productivity by any means and at any cost.

Thanks to advances in computing, storage, networking, and software, the performance of leading systems has increased one trillion times since the discovery of the CDC 6600 in 1964, from millions of floating-point operations per second (megaFLOPS) to quintillion (exaFLOPS).

The current holder of the crown, a colossal American supercomputer named The border, capable of reaching 1,102 exaFLOPS according to the High Performance Linpack (HPL) test. But it is suspected that there will be even more powerful machines in operation elsewherebehind closed doors.

The advent of so-called exoscale supercomputers is expected to benefit virtually every sector – from science to cybersecurity, healthcare and finance – and pave the way for powerful new models of artificial intelligence that would otherwise take years to develop.

The CDC 6600, widely considered the world’s first supercomputer. (Image: Computer History Museum)

However, this increase in speed comes at a price: power consumption. Full throttle, Frontier consumes up to 40 MW (opens in a new tab) capacity, about as much as 40 million desktop PCs.

Supercomputers have always been about pushing the boundaries of what is possible. But as the need to minimize emissions becomes more apparent and energy prices continue to rise, the high-tech computing industry will have to reassess whether to stick with the original guideline.

Productivity vs. Efficiency

One organization working at the forefront of this challenge is the University of Cambridge, which has partnered with Dell Technologies to develop several supercomputers with energy efficiency at the forefront of design.

The Wilkes3 (opens in a new tab)for example, ranks only 100th in the general performance charts (opens in a new tab)but sits in third place in the Green500 (opens in a new tab)ranking HPC systems based on performance per watt of energy consumed.

In a conversation with TechRadar ProDr. Paul Calleja, Director of Research Computing Services at the University of Cambridge, explained that the institution is more concerned with creating high-performance and efficient machines than extremely powerful ones.

“We’re not really interested in big systems because they’re very specific point solutions. But the technologies deployed inside them are much more widely used and allow systems that are an order of magnitude slower to work much more economically and energy-efficiently,” says Dr. Kalekha.

“So you’re democratizing access to computing for a lot more people. We are interested in using the technologies developed for these large, epoch-making systems to build much more sustainable supercomputers for a wider audience.”

Cambridge University

The Wilkes3 supercomputer may not be the fastest in the world, but it is one of the most energy efficient. (Image: University of Cambridge)

In the coming years, Dr. Calleja also predicts an increasingly aggressive push for energy efficiency in the HPC sector and the broader data center community, where we’re told energy consumption accounts for more than 90% of costs.

Recent fluctuations in energy prices related to the war in Ukraine have also made supercomputers more expensive to operate, especially in the context of exascale computing, further highlighting the importance of performance per watt.

In the context of Wilkes3, the university found that there were a number of optimizations that helped to improve efficiency levels. For example, by reducing the clock frequency of some components depending on the load, the team was able to achieve a reduction in energy consumption in the region of 20-30%.

“Within a specific architecture family, clock frequency is linearly related to performance but quadraticly related to power consumption. It’s a killer,” Dr. Kaleha explained.

“Decreasing clock speed reduces power consumption much faster than performance, but also increases the time it takes to complete a task. Therefore, we should not look at the energy consumption during the run, but at the actual energy consumption for work. There’s a sweet spot.”

Software is king

Beyond fine-tuning hardware configurations for specific workloads, there are also a number of optimizations to be made elsewhere, in the context of storage and networking, and in related disciplines such as cooling and rack design.

However, when asked where specifically he would like to see resources to improve energy efficiency, Dr. Calleha explained that the primary focus should be on software.

“The problem is not the hardware, but the efficiency of the applications. This will be the main bottleneck in the future,” he said. “Modern exoscale systems are based on graphic processor architectures and the number of applications that can efficiently run at scale on GPU systems is small.”

“To really take advantage of today’s technology, we need to focus heavily on application development. The development life cycle spans decades; The software used today was developed 20 to 30 years ago, and it’s difficult when you have such long-lived code that needs to be rebuilt.”

The problem, however, is that the HPC industry has not developed the habit of thinking software first. Historically, there has been a lot more focus on hardware because, according to Dr. Kalech, “it’s easy; you just buy a faster chip. You don’t need to think wisely.”

“Even though we had Moore’s Law, with processor performance doubling every eighteen months, you didn’t have to do anything [on a software level] to improve performance. But those days are over. Now, if we want progress, we have to go back and redo the software.”

As Moore’s Law begins to waver, advances in CPU architecture can no longer be the source of performance gains. (Image: Alexander_Safonov / Shutterstock)

In this regard, Dr. Calleja praised Intel. How server the hardware space is becoming more diverse from a vendor perspective (a positive development in most respects), app compatibility can be an issue, but Intel is working on a solution.

“I see the distinctiveness of Intel in that it invests a lot [of both funds and time] in oneAPI an ecosystem for developing code portability between silicon types. These are the toolchains we need so that tomorrow’s applications can take advantage of the new silicon,” he notes.

Separately, Dr. Calleha called for a tighter focus on “scientific needs.” All too often things “go wrong in translation”, creating a mismatch between hardware and software architectures and the actual needs of the end user.

A more vigorous approach to cross-industry collaboration, he says, will create a “virtuous circle” made up of users, service providers and vendors, leading to productivity benefits and efficiency perspective.

The future of the baby scale

As a rule, with the fall of the symbolic milestone in the exascale, attention will now turn to the next one: the children’s scale.

“Zettascale is just the next flag in the world,” said Dr. Calleja, “a totem pole that highlights the technologies needed to achieve the next milestone in computing that is unobtainable today.”

“The world’s fastest systems are extremely expensive in terms of scientific output. But they are important because they demonstrate the art of the possible and move the industry forward.”

Pembroke College, University of Cambridge, Open Zettascale Lab HQ. (Image: University of Cambridge)

Whether systems capable of achieving performance of one zettaFLOPS, a thousand times more powerful than the current crop, can be developed in line with the Sustainable Development Goals will depend on the industry’s capacity for invention.

There is no binary relationship between performance and energy efficiency, but each sub-discipline will require a healthy dose of skill to provide the necessary performance boost in the appropriate power band.

In theory, there is a golden ratio between performance and energy consumption, whereby the societal benefits of HPC can be said to justify the carbon costs.

The exact number in practice will, of course, remain elusive, but the realization of the idea itself is a step in the right direction.

https://www.techradar.com/news/the-dirty-secret-of-high-performance-computing/

Exit mobile version