Overview
The trend in high performance computing is towards the use of Linux
clusters. Concurrently, there has been a growing interest in the use
of Linux clusters for scientific research at Berkeley Lab. For many,
a cluster assembled from inexpensive commodity off-the-shelf hardware
and open source software promises to be a cost effective
way to obtain a high performance system.
Though many of the concepts
are simple, it remains difficult for scientists to navigate a myriad of
technologies in order to arrive at a cluster configuration that will meet
their needs. Similarly, it is harder to efficiently manage a multi-node
compute cluster than it is to do the same for a desktop workstation.
Consequently, early adopters of this technology
have had to invest large amounts of effort to realize the full potential
of their systems. Findings from the Berkeley Lab Midrange Computing Workshop
(March 2002) and subsequent discussions with
scientists identified a need for affordable centralized support
The Scientific Cluster Support program was developed to address the difficulties
of obtaining and running a Linux cluster system. The ultimate goal
being to increase the use of scientific computing to Lab research projects,
to introduce parallel computing to Berkeley Lab researchers and to
develop efficient, cost-effective methods for managing production clusters.
Program Description
Ten research projects from seven of the Lab's scientific Divisions were
selected to participate in the 4 year Laboratory-funded program after a
Lab-wide application process that was completed in September 2002. These
projects are eligible to receive the following services:
- Pre-purchase consulting - Understand customer
application; Determine cluster
hardware architecture and interconnect; Identify required software;
- Procurement assistance - Assistance with
developing a budget, development of RFP.
- Setup and configuration - This includes installation and setup
of the cluster hardware and networking; and installation and configuration
of cluster software, scheduler, and applications software
- Ongoing systems administration and cyber security - operating system and cluster software maintenance and upgrades; security updates; monitoring of cluster nodes; user accounts
- Computer room space with networking and cooling -
Clusters will be hosted in the Computing Sciences computer room in building 50B to insure access to sufficient electrical, cooling, and networking infrastructure.
The SCS Steering Committee, a small group composed of
selected CSAC members, end users, and technical experts, will aid in
the decision making and priority setting of the project.
Requirements
Systems in the SCS Program must meet the following requirements to be
eligible for support:
- IA32 or AMD64 architecture
- Participating cluster must have a minimum of 8 compute nodes
- Dedicated cluster architecture. No interactive logins on compute nodes
- Red Hat Linux operating system
- Warewulf cluster implementation toolkit
- Sun Grid Engine scheduler
- All slave nodes only reachable from master node
Clusters that will be located in the 50B-1275 Computer room must meet the
following additional requirements
- Rack mounted hardware required. Desktop form factor hardware not allowed
- Equipment to be installed into APC Netshelter VX computer racks. Prospective cluster owners
should include the cost of these racks into their budget
Cluster owners should check the
SCS Service Level Agreement
for a full description of the program provisions and requirements.
Schedule
New systems will be phased into the SCS program over the course of the first
year. Existing clusters will be added to the program in year 2.
The entire start to finish process for specifying, ordering, installing and
configuring a new cluster takes about 2 months. Cluster owners should
anticipate this delay and plan accordingly
The research projects that had planned to purchase their
cluster this year in fy03 were originally scheduled to go into the program
first so that their purchase funds are utilized in a timely manner.
Research projects that have existing clusters will be phased into the
program for support in fy04.
The following clusters were placed in the program in fy03
- Arup Chakraborty Research Group - Jan 2003 (Completed)
- Ashok Gadgil and Patricia Brown - March 2003 (Completed)
- Mike G. Hoversten and Ernest L. Majer - May 2003 (Completed)
- William H. Miller - May 2003 (Completed)
- William A. Lester - Aug 2003 (Completed)
- Michael B. Eisen Aug - 2003 (Completed)
The following clusters were phased into the program in fy04
- Steven Brenner, Paul D. Adams, Sung-Hou Kim, Stephen Holbrook - Dec 2003 (Completed)
- Priscilla Cooper and John Tainer - Dec 2003 (Completed)
- Martin Head-Gordon Jun - 2004 (Completed)
- William A. Lester - Cluster Upgrade - Jul 2004 (Completed)
In fy05, these clusters were added to the program:
- Steven G. Louie, Marvin L. Cohen - Nov 2004 (Completed)
- Gretina Project - March 2005 (Scheduled)
|