SysManNews
  Sort By :
DEPARTMENTS
 
 
 
 
ON THE WEB
 
 
 
 
 
 
BZ MEDIA
 
 
 
 
 
print Printable version 
Maximum Power, Minimum Space
By Alex Handy

April 1, 2008 — When it comes to large math problems, even the most complex equation is no match for the powerful desktops currently available to one and all. The days of setting a room-filling computer or grid on a single problem are quickly vanishing as those problems become scarcer and scarcer. Thus, when the Texas Advanced Computing Center (TACC) at the University of Texas at Austin first lit up its new Ranger super cluster, they filled its memory banks with something that’s a bit more than just a math equation.

Ranger is part of the TeraGrid, an America-spanning network of large computing clusters, which can be set upon any task a participating scientist sees fit. As most of these clusters offer at least a teraflop of processing power, those tasks tend to be of the simulation variety. But all that computing juice isn’t being used to look into our future: It’s being used to simulate the past. Ranger’s biggest jobs to date have mostly dealt with simulating the beginnings of our universe.

Suns Sparkle
As with any universal undertaking, Ranger began with a Sun: Sun Microsystems and AMD were the primary hardware providers for the system’s 3,000 blades. But even before that, Ranger began after the TACC team won a grant from the National Science Foundation in September of 2006, with Sun Microsystems. Ranger is no simple cluster, hacked together from the cheapest parts. The initial budget offered by the NSF for Ranger’s construction was US$30 million, plus any estimated maintenance costs expected in the first four years.

That budget ended up at a total of US$59 million. With a US$7 million per year operation cost, Ranger will cost almost as much to run for four years as its initial hardware costs.

But that’s not unusual, said Tommy Minyard, assistant director for Advanced Computing Systems at TACC. The money was coming from grants, said Minyard, so it was the least of his team’s worries. What did give his team pause, however, was the daunting task of bringing over 62,000 CPUs into a space of no more than 4,000 square feet.

“Really, the biggest challenge was the power and pooling for the system we designed,” said Minyard. “We have 82 computer racks. For a system with 504 teraflops that’s not many. Each rack is rated at just under 30 kilowatts per rack. For most racks it’s only 12, maybe 15 kilowatts a rack. It did entail a bit of additional design work with a data center design firm in town.”

All that power, 2.4 megawatts of it, meant that the TACC team needed a heavy-duty cooling system as well. Minyard said that the decision was made early on to set up in-row cooling with hot aisles. That equipment came from American Power Conversion Corp. (APC), and allowed the team to cool the entire cluster on only one megawatt of power.

Orderly Entry
The Ranger cluster was slated to live in a room at TACC that isn’t more than 4,000 square feet in size. And only half of that was actually usable space for the cluster. Thus, Minyard and the TACC team discovered that they would need to be very specific with their installation plans.

“The logistics and construction of the system were really a huge ordeal. We’ve got 15.6 kilometers of cables under the floor in there. They’d already put in the in-row coolers, so if you look under the floor in there right now there’s a huge jumble of equipment. There’s only 30 inches under the floor.

All this taught Minyard a valuable lesson about data center management and design. The key to success, he said, is flexibility. “Obviously, project plans are going to change, and you definitely need to be flexible,” said Minyard. “Try to schedule overlapping events to accommodate this. If we weren’t able to be really flexible with cable installation, versus software operation, versus acceptance and testing of the system, we’d still be working on integration of the system.”

On the Fringe
That flexibility was especially helpful when the initial software installation began. Minyard said that all this cutting-edge hardware—most of which has been integrated into Sun’s Constellation servers—wasn’t the easiest to deal with from a software perspective. Ranger runs on the CentOS, a variant of the Red Hat Linux distribution. Of course, with brand new servers, network cards and processors, the TACC team and the folks at Sun and AMD had to pave their own roads as they went.

Fortunately, the TACC team had previously discovered the Rocks cluster management tools, and had already begun tweaking the software to meet their needs during a previous installation. Minyard said that this software was critical for their success.

As with any large system, Ranger wasn’t supposed to leap into the TeraGrid’s river of jobs until some time after the installation was complete. But after the TACC team plugged it into the TeraGrid in February, the scientists using the grid couldn’t keep their batches off of it.

Bjorn Andersson, director for High Performance Computing and Integrated Systems at Sun, understands why that is.

“The people behind Ranger were expecting to have several months before they’d fill the cluster with jobs, but they are getting more requests every day. It’s basically, by itself, more than doubling the computing power on the grid. This really came into the TeraGrid and got up and running really fast. It will be running at the highest level of utilization by next quarter,” said Andersson.


Related Search Term(s): Sun MicrosystemsTeraGrid


Share this link: http://sysmannews.com/link/31900
 
 
 
 
  Search
 
 
 
Get Notified about the latest Systems
       Management Resources!

Subscribe to SharePoint Tech Report

 
 
 
 
LOADING...
 
IDGTechnet