SIE SUCCESSFULLY COMPLETES THE INSTALLATION OF THE GPU CLUSTER OF THE INSTITUTE OF COMPUTATIONAL CHEMISTRY AND CATALYSIS OF THE UNIVERSITY OF GIRONA, BAPTIZED AS GALATEA
Sistemas Informáticos Europeos, has successfully completed the installation of the GPU Cluster for the Institute of Computational Chemistry and Catalysis of the University of Girona.
This cluster was a great challenge for our company, since we needed to guarantee the redundancy of the management, monitoring and queue servers, to avoid a single point of failure. Therefore, we choose the Intel R1304WT2GSR platforms, which we trust for the reliability of its components, the redundancy of hot-swap power supplies and the speed of SSDs, to guarantee a reliable solution.
We asked David Ramírez, HPC Systems & Integrator Manager at SIE
What were the customer's needs?
Have clearly unified and dimensioned three types of networks for the high-performance computing environment (HPC). Dividing the control networks (management), storage and bmc (ipmi 2.0 remote control), the first two at Gigabit and IPMI at 100 Mb/s. It was also essential to choose reliable equipment, since it is an HPC cluster, with 2 redundant servers. Intel's own platforms were trusted for the main servers, which provide high availability (HA), with active-passive services on a Citrix Xen Server virtualization solution.
What alternatives were raised?
In the contest, Gigabit, 10G and Infiniband networks were proposed. However, for the Institute of Computational Chemistry of the UdG, the most important thing was to provide the system with an economical network, so that most of the budget went to providing GPU computing power. Thanks to this TPLINK solution, it has been possible to configure an HPC environment with a computing capacity of 1.584 Tflops in total and 72 Tflops per node, with a total of 2 Pflops if we count the Intel Xeon Broadwell E5-2620 V4 processors that it incorporates.
SIE leaves a common repository configured in BeeGFS, of more than 200 TB, where all the calculations are stored. This solution will allow the Institute of Computational Chemistry of the University of Girona to face new investigations.
The IQCC, thanks to these calculations, will be able to improve and shorten process times, which are essential in the creation of new drugs and more resistant and less polluting materials. This is key for the industrial and pharmaceutical sector in its development and places institutes like this one at the international forefront of Science.
Advantages of using the technology of the project now that it is finished.
The project is already finished and in production. Gigabit switches allow, thanks to their manageability, bounding between cluster nodes; the 100 Mb/s rackable switch offers a very economical solution for KVM over LAN connectivity and remote management, which does not have any further requirements, but is essential in equipment maintenance (integrated into a Nagios console); Finally, the router provides high availability between wan with gigabit width, allowing to connect the external LAN network and the IPMI, with a single port of the University and thus reduce the load of redundant servers.
What does the Ladon OS solution provide compared to solutions like Rocks?
The fact of using our Ladón OS 7.2 v8 development guarantees us a stable ecosystem of Open Source tools, which allow us to manage, monitor and supervise the entire cluster jointly and centrally.
To establish the difference with Rocks, Ladon OS is a puzzle and not a monolithic system. It is an ecosystem where the tools coexist together, but you can remove or add components, depending on parameters such as the size of the cluster or the complexity of the system to be used. In Rocks, we "load" with tools that we don't need, which makes it much heavier and difficult to upgrade.
The collaboration of several Universities and institutes, which offer their tools such as CLUES (CLuster Energy Saving), allows the system to evolve very quickly and be up to date. This can make a difference of between 20% and 30% in the performance of a cluster.
Ladón OS integrates tools such as MK check (on Nagios), Ganglia, Ansible, etc.In addition, the client has been helped to install applications such as Amber 16 or Gromacs on CUDA 8, deploying them throughout the cluster automatically thanks to EasyBuild.
The Institute of Computational Chemistry of the University of Girona researches in fields such as improvement and shortening process times that are essential in the creation of new drugs and more resistant and less polluting materials. This places the aforementioned Institute "at the international forefront of science"
This GPU cluster based on Intel Xeon processors is one of the most powerful installed in Spain to date and one of the 10 largest in Europe, with this technology, which provides a total of 2 Pflops thanks to the new NVDIA Pascal technology with CUDA 8.