If anything has brought us this 2021, it has been the definitive implementation, both in Intel and AMD, as well as in ARM, of the PCI-E 4.0 bus and SSD or solid state disks, which will gradually eat the conventional or magnetic ones.
The revolution has been very big and we have hardly had time to communicate the release of new products, hand in hand with Gigabyte, of which we are wholesalers and integrators for Europe, Latin America (from Chile) and South Asia (from Nepal).
The PCI-E 3.0 bus had become too small and devices are demanding more and more bandwidth: GPUs, SSD disks, RAID controllers, etc.
We went from 32 GB/s bandwidth to 64 GB/s bandwidth.
But not only this has grown, so has memory access: from 4 channels at 2666 Mhz, through 6 channels at 2933 Mhz, to 8 channels at 3200 Mhz. That is, put per processor and in GT/s (Gigatransactions per second), we have gone from the limit of 10.4 GT/s to 25.6 GT/s.
As is logical, this needed to go hand in hand with the speed to disk and this has also increased in SSD disks, which no longer have the limitation of the mechanics and can go to persistent memory chip speed (without data loss), going from the limitation of the SATA bus, which could only deliver 6 Gb/s and therefore obtain disks of maximum 100,000 IOPS, to 32 Gb/s, which allow disks in Gen 3 of up to one million IOPS (Input Output per second) and up to 2 million IOPS in Gen 4.
Logically, these speeds can also be boosted through RAID technologies, which allow multiple disks to be grouped together to obtain more performance and/or added security in RAID 0, 1, 10, 5 and 6. In this regard, there have been several approaches:
1st From Intel, with VROC, which allowed the processor to manage disk RAID in a hardware/software environment, from Gigabyte’s own bios and with an additional license. In principle only 2 NVME SSD disks and with expander up to 24 disks, with the consequent load on the processor and sacrifice of control lanes. This was already seen in the 2nd generation scalable or known as CascadeLake and has been improved in the 3rd generation or IceLake.
2º By Broadcom, owner of the classic LSI Raid controller technology, which has a specialized RISC processor, which offloads the processor. Until now, these controllers were only capable of handling SATA3 and SAS3 disks. The new controllers that are being launched now also allow handling NVME disks, up to a maximum of 4 drives directly, being able to do RAID 0, 1 and 5. Logically, the new RISC processor is more powerful and will still be able to handle SAS/SATA disks and/or NVME disks, with certain limitations.
3rd NVIDIA has already launched this technology and the platforms will see the light for the first Q1 (Feb-Apr 2022, is company carries the quarters out of phase with respect to the natural ones), a new concept. The recent T1000 GPU has been redesigned and renamed the SupremeRAID™ SR-1000 for use as a RAID controller under a newly coined concept called GRAID SupremeRAID™. This innovative solution unlocks 100% of the performance of NVMe SSD drives, without sacrificing data security.
While traditional RAID technologies generated bottlenecks on SSDs, the new GRAID solution develops a new hardware/software system that overcomes this limitation on NVME SSD U.2 (2.5″) and PCI-E 4.0 disks.
GRAID SupremeRAID works by installing a virtual NVMe controller in the operating system (available on Windows Server and major LINUX distributions) and integrates a high-performance PCIe 4.0 device, which the processor can handle as a single device, without sacrificing performance, so important in HPC systems.
This technology is already present in the SIE Ladon All-flash 4.0, which offers up to 285 TB net, with a performance of up to 6 million IOPS and 100GB/s network throughput. It is based on the R282-Z9G platform , which can accommodate up to 20 NVME U.2 disks. PCI-E 4.0 and up to 128 cores with two AMD Milan processors and a network output, based on the Mellanox MCX653105A-EFAT card, which can deliver over PCI-E 4.0 100 Gb/s bus, either in Infiniband HDR 100 Gb/s technology or Ethernet at the same speed.
At this point, with equipment with up to 8 A100 cards with 80 GB of memory per NVLINK bus at 600 Gb/s or the new equipment with up to 10 A40 cards with 48 GB, both focused on Deep Learning (training) and Artificial Intelligence, many of you will have the question that SIE Ladon systems in many cases are not isolated machines and are integrated within a cluster.
Indeed, this bandwidth at the internal level of the machines forces us to consider how to interconnect them.
First of all, Sistemas Informáticos Europeos is committed to Mellanox HDR Infiniband technology at 100 and 200 Gb/s, but also Ethernet compatible, in order to be able to integrate these solutions into classic networks.
Infiniband involves communication packets, with headers and queues, much smaller than the Ethernet protocol, which allows parallelizing processes between machines. It also offers a latency, which is in the order of magnitude, of that of 2 processors on the same board.
However, in order to be able to perform processes without bottlenecks, for example in the case of GPUs, it has been necessary for the internal NVLINK bus of the devices to be extended between machines via the NVSWITCH protocol. This is possible through the Broadcom PCIe Switch hub, as shown in the following diagram.
Here you can see how one of the PCIe 4.0 x16 slots can be sacrificed to put a 200 Gb/s Mellanox HDR card instead of a GPU and connect GPUs from different machines over the Infiniband network, with much less load on the main AMD processor.
All this technology can be seen more clearly in the following video from NVIDIA, where this new NVSWITCH technology is very well explained.
However, we are still at the same stumbling block and that is how to interconnect all this new speed with our classic Ethernet networks.
Well, first of all, we have a 32-port switch with Linux Cumulus technology. About this technology, you have a very interesting page by Eduardo Collado where this new open standard is explained very well.
As I was saying, we can have Mellanox trunks on Ethernet up to 100 and 200 Gb/s, for equipment that needs high-speed access to high-performance All-Flash 4.0 solutions.
In addition, SIE offers Mellanox HDR 100 Gb/s dual port cards, where one of the ports can negotiate Infiniband protocol and the other Ethernet protocol, with the consequent saving of slot and investment, in a single PCI-E 4.0 x16 connectivity card, which can have two slots of this type, to connect simultaneously with the two processors of the machine and avoid bottlenecks.
To distribute a less expensive connectivity to the rest of the network, we have the DXS 3610 54T-SI model from the manufacturer D-LINK, which has 6 100 Gb/s Ethernet ports and that our company can interconnect without any problem through QSFP+ with the Mellanox HDR 100 cards, previously mentioned.
This innovative switch, priced at around 7,000 € (much cheaper than competing solutions and with a lifetime warranty), allows those 6 ports to be used for two purposes:
1º Allows interconnecting up to 16 equal switches in a redundant ring of up to 200 Gb/s, without bottlenecks, stable and manageable by a single IP. For this we use only 2 of the 6 ports available on the switch.
2º The remaining 4 can be used to connect to equipment with the Mellanox HDR dual port card, using both in Ethernet mode up to 200 Gb/s. In addition, these ports are also compatible with the 40 Gb/s Ethernet standard, which allows them to be used with much lower cost cards and QSFP+ connectivity, such as the Gigabyte CLN4752 card, equipped with the Intel® XL710-BM2 chip and dual port at this speed.
Logically, each of these switches has another 48 10GbaseT ports (copper), which allows the distribution of the network, using the cabling already available. This cabling, if it is category 5e, can be extended up to 50 meters and if it is category 6 up to 100 meters, being very easy to handle and connect, since it is made of copper.
In this way, we can have up to 48 free 100/40 Gb/s Ethernet ports and 576 10GbaseT ports, managed as if they were a single device and with a lifetime warranty. This is a competitive advantage over costly rack-mount switch solutions. In addition, D-Link’s D-View 7 and future version 8 software (free for up to 25 devices), allows you to view and manage this switch stack, as well as other 10G and gigabit distribution switches on the network.
Currently, all of D-Link’s switches, which SIE markets for cluster and professional solutions, are manageable and support this software, in addition to many of the WIFI 6 access points, which already support up to gigabit bandwidth.