Computational Methods Development

The CCDG also develops methods to improve gene identification efforts, and provides secondary statistical genetic data analysis to collaborators around the world.  

CCDG Mathematical Modeling and Computation (MMC) Lab Linux Cluster

Indy, the MMC lab’s 25-node high-speed computing cluster, is designed to allow development, testing, and application of statistical genetic methods to large-scale high-throughput genetic data, which possess the characteristics of current-day "big data". The specifications for this cluster were developed by Dr. Manika Govil, Director MMC lab, in consultation with collaborators at the Battelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, Ohio. Indy is housed in a controlled-environment facility at the University of Pittsburgh’s Network Operations Center. It is compliant with the security and privacy requirements of the University of Pittsburgh, as well as those of our grant funding agencies. The NOC also provides essential system monitoring and backup support for the cluster. Day-to-day system administration tasks are handled by CCDG personnel.

Indy currently comprises 19 compute nodes, 1 head node,1 web server node, and 3 data storage nodes, connected through a high speed 7Gpbs Infiniband network as well as a 2Gbps ethernet.  Indy is installed with CentOS v6.x operating system (a Red Hat enterprise class Linux distribution) and uses Sun GridEngine software for concurrent process allocation. Cluster performance monitoring packages, including Ganglia and Nagios are available. Both GCC and Intel suites of software development tools are available, as well as an array of the latest statistical genetic analysis software. Indy is also capable of compiling and running parallel programs using the OpenMP, MPI and MPI-2 libraries.

All 19 compute nodes on Indy are equipped with 80GB SATA solid state drive to reduce I/O time during computation. Of these 19 nodes, 4 compute nodes each have 2 8-core Intel Xeon E5-2670 2.6GHz processors with 20MB cache, and 256GB of memory.  The remaining 15 compute nodes each have 2 6-Core Intel Xeon X5670 2.93GHz processors with12MB cache. The memory of these 15 nodes range from 24GB to 256GB with the following distribution: 10 nodes with 24GB, 2 nodes with 48GB, 2 nodes with 96GB, and 1 node with 144GB, and 4 nodes with 256GB memory.

The head node is responsible for user access, task management, and monitoring cluster performance. It is configured with 2 Intel 8-Core Intel Xeon E5-2650 2.0GHz processors with 20MB cache, a 120GB SATA solid state drive for trhe boot partition and operating system, and 8 3TB SAS Hard Drives with 64MB cache, effectively providing approximately 12TB storage (with RAID 5). The head node is further equipped with a fiber-channel HBA which connects to the NOC's storage fabric, including its enterprise tape backup facilities.

The web server node is configured with 2 Intel Quad Core Xeon E5620 2.40GHz chips with 12MB cache, and 4 2TB SATAII Enterprise Hard Drives with 64MB cache, effectively providing approximately 5TB storage (with RAID 5). Similar to the head node, the web server node is also equipped with a fiber-channel HBA to connect to the NOC's storage and backup facilities.

Three data storage nodes are dedicated to the permanent housing of high-throughput whole genome sequencing data. Each storage node provides a total of 216 TB of storage space, and is configured as a raid 6 array of 32 SATA solid state drives with hot swap capability and 1 dedicated hot spare drive.

Indy is available as a high-speed computing resource to the faculty and compute staff at the Center for Craniofacial and Dental Genetics, and for collaborators external to CCDG, who are actively involved in CCDG research projects.