Condominium Cluster Computing
(Available for LBNL researchers)

Overview

In recognition of the increasing importance of research computing across many scientific disciplines, LBNL has made a significant investment in developing the LBNL Condo Cluster program, as a way to grow and sustain midrange computing for Berkeley Lab.  The Condo program is intended to provide Berkeley Lab researchers with state-of-the-art, professionally-administered computing systems and ancillary infrastructure, with the intent of improving competitiveness on grants, and achieving economies of scale with centralized computing systems and data center facilities.

The model for sustaining the Condo program is premised on faculty and principal investigators using equipment purchase funds from their grants or other available funds to purchase compute nodes (individual servers) which are then added to the Lab's Lawrencium compute cluster. This allows PI-owned nodes to take advantage of the high speed Infiniband interconnect and high performance Lustre parallel filesystem storage associated with Lawrencium. Operating costs for managing and housing PI-owned compute nodes are waived in exchange for letting other users make use of any idle compute cycles on the PI-owned nodes. These PI owners have priority access to computing resources equivalent to those purchased with their funds, but now can access more nodes for their research if needed. This provides the PI with much greater flexibility as compared to owning a standalone cluster.

This program is intended for PIs that would otherwise purchase a small (4 nodes) to medium scale (72 nodes) standalone Linux cluster. Projects with larger compute needs or many users or groups should consider setting up a dedicated cluster so that they can better prioritize shared access between their users.

Program Details

Compute node equipment is purchased and maintained based on a 4-year lifecycle at which point the PI owning the nodes will be notified that the nodes will have to be upgraded during year 5. If the hardware is not upgraded by the end of 5 years, the PI may donate the equipment to Condo or take possession of the equipment (removal of the equipment from LC3 and transfer to another location is at the PI's expense); nodes left in the cluster after five years may be removed and disposed of at the discretion of the HPCS program manager

All Lawrencium and condo users have a 10GB home directory on the Lab's shared HPC infrastructure and are charged $25/mo. for account maintenance which includes backups of their home directory. Users or projects needing more space for persistent data can purchase storage shelves that can be hosted by the Lab's HPC infrastructure. Storage shelves are purchased and maintained on a 5-yr lifecycle after which the PI must renew the storage purchase at the then-prevailing price or remove the data within 3 months.

Once a PI has decided to participate, the PI or his designated person works with the HPC Services manager and operations team to procure the desired number of compute nodes and storage. Generally, it takes about three months from start to finish. In the interim, a test condo queue with a small allocation will be setup for the PI's users in anticipation of the new equipment. Users may submit jobs to the general Lawrencium queues on the cluster, but use will incur the cpu usage fees of $0.01 per service unit. Jobs are subject to general queue limitations and guaranteed access to contributed cores is not provided until purchased nodes are provisioned.

Recommended Equipment

Compute node with the following specifications:

General Computing Node
Processors Dual-socket, 14-core, 2.4GHz Intel Broadwell Xeon E5-268v4 processors (28 cores/node)
Memory 64GB (8 X 8GB) 2400Mhz DDR4 RDIMMs
Interconnect 56Gb/s Mellanox ConnectX3 FDR-14 Infiniband interconnect
Hard Drive 500GB 7.2K RPM SATA HDD (Local swap and log files)
Warranty 5 yrs

GPU Computing Node
Processors Dual-socket, 4-core, 3.0GHz Haswell Xeon E5-2623v3 processors (8 cores/node)
Memory 64GB (8 X 8GB) 1833Mhz DDR4 RDIMMs
Interconnect 56Gb/s Mellanox ConnectX3 FDR-14 Infiniband interconnect
GPU 2 ea. Nvidia Tesla K80 accelerator boards
Hard Drive 500GB 7.2K RPM SATA HDD (Local swap and log files)
Warranty 5 yrs

GPU Computing Node for Machine Learning and Image Processing
Processors Dual-socket, 4-core, 3.0GHz Haswell Xeon E5-2623v3 processors (8 cores/node)
Memory 64GB (8 X 8GB) 1833Mhz DDR4 RDIMMs
Interconnect 56Gb/s Mellanox ConnectX3 FDR-14 Infiniband interconnect
GPU 4 ea. Nvidia GTX1080Ti accelerator boards
Hard Drive 512GB SSD (Local swap and log files)
Warranty 5 yrs
Price $8300


Storage: Hitachi HDS HUS 110 storage shelf configured with 12x4TB Nearline SAS 7200RPM disks.Storage: Hitachi HDS HUS 110 storage shelf configured with 12x4TB Nearline SAS 7200RPM disks.

Prospective condo owners should contact HPC Services Manager Gary Jung prior to purchasing any equipment to insure compatibility.

Lawrencium Condo Owners
David Prendergast,  Molecular Foundry
Jeff Neaton, Molecular Foundry
Daniel Haxton, Chemical Sciences Division 
William Lester, Chemical Sciences Division
Quanlin Zhou, Earth and Environmental Sciences Area
Jens Birkholzer, Energy Geosciences Sciences Area
Curtis Oldenburg, Earth and Environmental Sciences Area
Barry Freifeld, Earth and Environmental Sciences Area
Peter Lau, Earth and Environmental Sciences Area
David Romps, Earth and Environmental Sciences Area
Kristin Persson, Energy Technologies Area
Anubhav Jain, Energy Technologies Area