A rchive Date
[ 16-12-2005 ]
Category
[ Information Technologies ]
sub-Categoy
[ Computers ]

http://www.extremetech.com/article2/0,1697,1901438,00.asp

What Would You Do With A Supercomputer?

December 14, 2005

Searching for a massively powerful digital system to help researchers grapple with complex tasks like forecasting weather and simulating particle interactions? Look to an enormous, well - funded organization. Or on a desktop.

Supercomputing

, to most people, means astoundingly powerful calculating machines, each requiring a roomful of hardware - and money. That's correct, but not complete. What counts as a

supercomputer

is a matter of some disagreement, as you'll see, but you'll be surprised by the incredible power you can get in a desktop system that costs little more than a high - end gaming PC.

Whatever disagreements there may be, few knowledgeable observers would dispute that the late Seymour Cray was the driving force in the nascent field. And fewer still would dispute that the original Cray - 1, which was installed in 1976 at Los Alamos National Laboratory (LANL), qualified as a supercomputer.

But ask what machine should be honored as the first, and consensus disappears. Some people point to earlier computers that Cray built while he was still at Control Data Corporation. Others go further back, citing some of the earliest programmable machines. And if you try to pin down systems that were precursors systems, you'll get a similar lack of agreement.

History, or Who's Your Daddy?

The group working on development tools at Orion Multisystems, which makes high - performance desktop workstations, is led by John Casu. In his view, the first machine to approach supercomputing status was the ILLIAC IV (Illinois automated computer IV), which was laboriously developed in the 1960s, and became fully operational in the mid - 1970s.

But Wu Feng of LANL, the team leader for research and development in advanced network technology and an Institute Fellow at the lab, would go back still further. He points to the MANIAC (mathematical analyzer, numerical integrator and computer), built by his desert employer in the early 1950s, as the true progenitor.

Ed Turkel, manager of high - performance computing product marketing at Hewlett - Packard, argues that the 5,200 - vacuum - tube UNIVAC (universal automatic computer), also built in the early 1950s, actually was a supercomputer. (And you can easily find references to the ILLIAC IV and MANIAC as supercomputers, also.)

All these candidates (as well as others) for precursor or first supercomputer fame share at least one trait: They were among the most powerful computers of their time, if not the most powerful. And other than the UNIVAC, which was the first commercial computer made in the US, all were designed for scientific and technical computing rather than business use.

Architecture School

Most processors are scalar (or essentially so) - that is, they work on one or two pieces of data at a time. To add two groups of 10 numbers for example, a scalar processor must get the first number from each group, do the addition, send the result off, and then move to the next pair.

The Cray - 1 - and other supercomputers through about the mid - 1980s - used one or more vector processors. (This approach is becoming less common, though calling it rare would be an overstatement.) A vector processor works on arrays - data vectors - which means that in our example, it can retrieve all 10 numbers from both groups in one step and add all 10 pairs simultaneously.

This works especially well for contiguous sets of data, so it's tailor made for the kind of number crunching needed to predict the weather, simulate nuclear explosions, or look at the aerodynamics of a new wing design. It also happens to be perfect for streaming video, which is why the Sony PlayStation 2 supports vector processing, and why the most powerful chip in most gaming systems is not the CPU but the GPU.

You can still find supercomputers based on vector processors, including Japan's Earth Simulator, currently the fourth most powerful in the world (see "Cream of the Crop" sidebar). But the trend has been away from special - purpose - not to mention expensive - vector processors.

System architects increasingly use scalar CPUs - the same ones you find in desktop systems - known in the supercomputing community as COTS, for commodity off the shelf (or commercial off the shelf, depending on who you talk to).The trick is to use a lot and wire them together to work in parallel.

The SMP Architectures

Starting around 1983, the number of companies building supercomputers exploded. Some developed their own vector processors, others found ways to tie lots of COTS CPUs together. Most of those ventures and many of the architectures they explored disappeared in the nineties, but two primary computer architectures survive: symmetric multiprocessing (SMP) and massively parallel processing (MPP).

The definitions of these architectures vary depending on who you talk to. For example, some say SMP systems have vector processors, others say they don't. The passage of time partly accounts for the conflicts: As technologies change, old definitions become less useful, and people seek better ways to categorize. In some quarters you'll find the term supercomputer replaced by high - performance computer. Here are reasonably common definitions for the two main architectures, but consider them subject to argument.

In SMP designs, a single copy of the operating system directs multiple processors, all of which share the same memory and data bus. This allows the system to easily shift tasks among processors to balance the workload, but the path to memory is limited - a potential bottleneck.

MPP designs, like that of the IBM Blue Gene /L - currently the world's most powerful computer - give each CPU its own memory and operating system and then interconnect the processors so they can exchange information. This requires fast and extremely reliable connections to ensure that each processor gets the information it needs as needed. For certain kinds of computations, communication between processors is essential.

Predicting weather, for example, can be broken down into lots of independent calculations, each determining what's happening in a small volume of the atmosphere. But the ultimate result for an individual volume is highly dependent on what's happening in the neighboring volumes. At every step in the calculation, each processor needs to know what the results are in the neighboring volumes before calculating the next step.

The New Dominant Architectures: Clusters and Constellations

One additional architecture emerged in the nineties and has become increasingly important: the cluster. Essentially, it's two or more computers tied together through software and network connections so they behave like a single entity from a user's point of view.

In late 1993, Donald Becker and Thomas Sterling started outlining the concept of the Beowulf cluster. This clever development was an inexpensive way to get supercomputing power out of standard computers simply by tying many together, creating a whole greater than the sum of its off - the - shelf parts. The prototype used 16 Intel DX4 processors that were linked by channel - bonded Ethernet

The

www.beowulf.org

Web site says clusters can consist of just two computers sharing a file system or 1,024 separate nodes tied together by a high - speed network with the infrastructure provided by open - source software - specifically, Linux.

Some enthusiasts broaden the definition to allow a Windows infrastructure. Either way, Beowulf clusters provide a supercomputing approach that's proven attractive to researchers who don't have the budget to buy a single much more expensive system.

You don't have to put a system together yourself to get a cluster, though - there are companies that make integrated packages. Some shoehorn multiple computers into a single box - often on the same board, and sometimes on the same chip.

The Orion Multisystems DT - 12 workstation, for example, is a 12 - node (read: 12 - computer) cluster in a 4 - by 24 - by 18 - inch desktop box. Each node uses a 1.2 - GHz Intel - compatible Pentium4 - class processor with your choice of 512MB or 1GB of RAM.

The nodes are tied together by 1 - Gbps Ethernet connections, and the system can perform at up to 36 gigaflops. Basically, the DT - 12 is a personal supercomputer - and it starts at under $10,000.

In certain configurations, a cluster is known as a constellation. The deciding factor concerns the relationship between the number of nodes and the number of processors in each node, according to Jack Dongarra, one of the term's originators, and a University Distinguished Professor in the computer science department at the University of Tennessee.

Each node in

a cluster is an independent computer with one or more processors

. If there are fewer processors per node than there are nodes in the cluster - as with a Beowulf configuration that consists of standard, single - processor systems - you have a cluster configuration. But if any node has more processors than there are nodes in the cluster, the configuration is called a constellation instead.

Why such a fine line? Because, as Dongarra and several coauthors pointed out in a 2003 paper, the distinction affects the approach to programming a machine. The authors cited the SGI Altix system at the NASA Ames Research Center in Mountain View, California as a good example of a constellation - each of the system's 20 nodes has 512 processors.

Power - Who's The Baddest?

You couldn't resist doing the math, right? So you discovered that 20 times 512 works out to 10,240 processors. And that naturally leads to the question, how much computing muscle does a supercomputer really deliver?

We wouldn't say that the supercomputing community is obsessed with power, but let's face it: That's what high - performance computers are all about. By definition. So anyone who builds them, sells them, designs them, buys them, or is just interested in them can tell you things like who's building the most powerful, just how powerful they are, what architectures they're using, and even where they're installed.

At least one source for these details is the

TOP500 project

, which has been gathering data about individual systems since 1993. The project's stated intent is to provide "a reliable basis for tracking and detecting trends in high - performance computing." The project's central tool for doing that is the TOP500 list - probably more popular in the supercomputing community than Letterman's Top 10.

The list, which is updated twice a year, is a table that gives the names, test results, and much other information for the world's 500 most powerful computers as of the publication date. The site bases its rankings on results from the widely used LINPACK benchmark test introduced by Dongarra, who is also one of the authors of the TOP500 list.

As the TOP500 site points out, the LINPACK benchmark test does not attempt to determine the overall performance of a computer. Rather, it measures "the performance of a dedicated system for solving a dense system of linear equations." (Finding such solutions is a common task in the world of engineering.) But although the test may not tell you everything about a the way a computer will perform, it reveals enough to be useful.

In any case, the test measures how fast a computer can perform floating - point operations. The results are in flops, an acronym for floating - point operations per second, although the number is more often given in megaflops, gigaflops, or even teraflops (trillions of flops). The most powerful current computers operate in the teraflops range. Benchmark - test results reach into the tens or even the low hundreds of teraflops.

From the TOP500 site, you can glean a wealth of information about the current state of supercomputing. But because the data goes back to 1993, you can also see what trends have developed over the last dozen years. For example, the first list, which was published in June 1993, showed 249 SMP systems and 97 single - processor machines. Search for those categories today, and you won't find them.

On the other hand, clusters and constellations, the two architectures that overwhelmingly dominate the TOP500 these days, didn't even show up on the chart in 1993. By December 1995, though,16 constellations had appeared. At this point, there are 79.

When clusters first appeared in June 1997 they managed just a single entry. Today, machines based on that architecture consume 304 of the 500 slots. Lump clusters and constellations together and they account for over 76 percent of the list. The remaining 117 systems all use MPP architecture.

As you probably suspect, the level of computing power needed to make the list has shot up dramatically over the last dozen years. In June 1993, the most powerful calculating machine on the planet burned through benchmark tests at 59.7 gigaflops. In June 2005, the system that grabbed the number one spot was almost 2,300 times as powerful, crunching number at an astounding 136,800 gigaflops - 136.8 teraflops.

The computing power needed to make the lowest spot on the list also grew dramatically. In 1993, the 500th most powerful computer on the planet scored a lowly 0.42 gigaflops. Twelve years later, the bottom spot went to a system that processed at 1,166 gigaflops, or 1.166 teraflops. In fact, June 2005 marked the first time that every listed system exceeded a trillion floating - point operations per second.

Here's an observation that will help you appreciate just how good a computer you have. Today's garden - variety Pentium 4 desktop can manage 1 or more gigaflops on the LINPACK benchmark test. And we've seen claims of speeds as high as 4.5 gigaflops for a 3 - GHz Pentium 4 system. That score would have easily put today's typical PC on the TOP500 list in 1993. In other words, your desktop system is basically a slightly obsolete supercomputer.

Now It Gets Personal

Aside from the cost and availability problems, a few other issues prevent you from upgrading your gaming rig to a more contemporary Top500 system: Most supercomputers are big and noisy, generate a lot of heat, and can't plug into standard power outlets. They wind up in special rooms (or even buildings) designed for their needs.

They also break down. A lot. The world's most powerful system - the Blue Gene/L at LANL - has a mean time between failure of six - and - a - half days, according the lab's Web site. And supercomputers are usually shared among multiple users, so if one person's program manages to crash the entire system, everyone suffers.

That's where personal supercomputers like those from Orion Multisystems come in. Orion's machines are based in part on technology developed by Wu Feng at LANL. In building his Green Destiny supercomputer, Feng focused on high reliability, which leads to an interesting twist.

Because Green Destiny rarely breaks down, it can actually perform more computations in a year than much faster supercomputers that frequently fail. Feng does point out, though, that some problems absolutely require more powerful computers that have more memory or build in other additional resources. So Green Destiny can't handle every supercomputing job.

The personal supercomputers from Orion fit on your desktop - or at least in your office. They also plug into standard wall sockets, and they don't have special cooling needs.

But most importantly, researchers get supercomputers to themselves and can easily use them interactively. Some kinds of problems benefit from the ability to look at the results of one step before deciding which directions look promising for further calculations and which will just waste time. You simply can't get that level of interactivity when you need to log on to a system remotely or submit computing jobs to batch queues.

The situation is analogous in many ways to what happened in the early days of the PC. In that case, one of the driving forces was people discovering they could often do more - and do it more quickly - with an Apple II and VisiCalc than by running their numbers on faster mainframes - but waiting weeks for the IT department to submit their jobs. The interactive nature of personal computers made the difference.

Although personal supercomputers may not make the TOP500 list, they offer significantly more power than standard PCs. And Orion's desktop model starts at just under $10,000 ($9,990 to be precise), so as with the Apple II in a world of mainframes, not only can a supercomputer be personal, it can also be relatively affordable. For those who need the power and interactivity, that can make all the difference.

What's a Super-Processor?

We couldn't find any reason to believe that there's actually something called a superprocessor, much less a good definition for one. But that doesn't stop people from using the word. To end this conundrum, we asked the supercomputing community to define it. The one thing they all agreed on is that, technically speaking, there really is no such thing. But most ventured a definition anyway. Here's an assortment of the more interesting answers, and some valuable insights into the possible future of supercomputing.

"A superprocessor provides capability well beyond what you'll get from the commodity processors you'll find in a Dell or Gateway computer. It therefore follows that a superprocessor has no future in terms of market share. On the other hand, the research and development that's poured into the development of superprocessors will eventually trickle down to commodity processors." -

Wu Feng, Team Leader of High Performance Networking and Institute Fellow, Los Alamos National Laboratories

"The first time I ever saw that term was in the message about doing this interview. I did a Google search and got lots of interesting hits. The most relevant tagged it to what are essentially massively parallel processors on a chip. It's the concept of putting lots of lightweight processing cores all on a single die and using them for applications that have a high level of computational intensity - not that much data but lots of operations to do on that data." -

Dave Parry, Senior VP and General Manager for Server and Platform Group, SGI

"It's a processor that's at least an order of magnitude greater in performance than a commodity processor. Typically in the past, they've been vector processors. I'm aware of (but not free to talk about) a couple of startups looking to build very interesting processors around a hybrid scalar - vector processor. The other thing to keep in mind is that the industry goes through pendulum swings. We started with vector processors as superprocessors, went to commodity processors, and now we are swinging toward superprocessors as multiple low - power commodity processor cores on a single chip. One things for sure: The world is headed toward parallel processing as the way to achieve higher performance." -

Chris Hipp, VP of Applications, Orion Multisystems

"A superprocessor implies a huge jump in performance - not just an incremental improvement as per Moore's law, but a jump like quantum computers would provide. There's also the potential in some optical computer technology. And it could also be a dedicated co - processor." -

Tom Jones, Sales Engineer, Orion Multisystems

The Cream of the Crop

Where can you find the biggest, baddest computers on the planet? Here are the top four systems on the most recent TOP500 list, and a quick summary of what they're being used for. And if these systems aren't powerful enough for you, just wait. The TOP500 list is updated twice a year because things change that quickly. In the wings are still faster systems, as well as systems that might not be fighting for first, but will still push slower systems off the bottom. In the most recent update, 201 systems fell off.

The BlueGene/L at Lawrence Livermore National Laboratory, already number one on the list, is about to double in capacity, possibly by the time you read this. The finished system will have 131,072 processors and a theoretical peak system performance of 360 teraflops. Beyond that, the race is on for the petaflop mark: one quadrillion - that's 1,000 trillion - floating point operations per second. And the Japanese government has set a goal of building a system capable of several petaflops by 2011. One suggested use for a machine that powerful is to let researchers simulate things like how a medicine is absorbed in a human body and how it affects a specific organ - effectively simulating a human being.
BlueGene/L is a joint development of IBM and the Department of Energy's National Nuclear Security Administration. Installed at Lawrence Livermore National Laboratory in California and upgraded in 2005, BlueGene/L uses 32,768 dual - processor chips, for a total of 65,536 PowerPC 440 700 - MHz processors. It boasts the current top performance for any computer anywhere, at 136.80 teraflops.
BlueGene/L is used in the Advanced Simulation and Computing program at Livermore, which relies on computer simulations to ensure that the U.S. nuclear stockpile is safe, reliable, and operational. (Those have to be really good simulations. No actual thermonuclear explosions needed, thank you.)
The Watson Blue Gene system, nicknamed BGW, is at IBM's Thomas J. Watson Research Center, in Yorktown, NY. BGW was also built by IBM and announced in 2005. It uses 40,960 PowerPC 440 processors, and hits a substantial 91.29 teraflops. IBM plans to use BGW in part to explore how this much computing power can best be used for both business applications and in technical fields like life sciences, hydrodynamics, materials sciences, quantum chemistry, molecular dynamics, and fluid dynamics.
This list obviously includes a wide range of applications, from simulations of protein dynamics useful for developing new drugs to tracking and analyzing world financial markets for global risk management. Other potential uses that IBM dubs previously unsolvable (emphasis on previously, of course) include combining weather forecasting software with predictive software for applications like disaster response, forecasting utility supply and demand, and scheduling maintenance and planning transportation in the agricultural industry.
Columbia, named to honor of the crew of the Space Shuttle Columbia, is an SGI Altix installed at the NASA Ames Research Center in Mountain View, CA, in 2004. Columbia uses 10,240 Intel Itanium 2 processors (each running at 1,500 - MHz) to deliver 51.87 teraflops of computing power. NASA uses Columbia for studies ranging from turbine heat transfer and the behavior of potential new fabrics for spacesuits to simulating the surface of the sun and the evolution of galaxies. A small sampling of other projects that run on Columbia include weather prediction, climate studies, modeling the Earth's magnetosphere, and the aerodynamics of an ascending space shuttle.
The Earth Simulator, an NEC SX - 6 funded by the Japanese government, is located at The Earth Simulator Center in Yokohama. It was number one on the TOP500 list when it was finished in 2002, and it stayed there for two years, dropping to number 3 in late 2004, and to number 4 in mid - 2005. It's built around 5,120 NEC 1,000 - MHz processors and delivers 35.86 teraflops.
The Earth Simulator was designed to provide "a holistic simulation of the entire earth system" - atmosphere, ocean, and solid earth (well duh, look at the name). Not surprisingly, much of the work done on the Earth Simulator studies things like climate and ocean variability and convection in the Earth's mantle. But it's not limited to those areas. Other projects listed on the Earth Simulator Center's Web site include studies of rocket engine internal flows, the properties of the carbon - nanotube, electrode reaction in fuel cells, and evaluating the plasma environment around a spacecraft that uses electric propulsion.

Ziff Davis Publishing Holdings Inc.

]

Cross-Indexed: