This document tries to delve into the question of scientific calculation on GPU (we recommend reading “Why migrate to GPU” beforehand).
Before taking this step, we should have analyzed the following image:
I think this graph is very revealing before we move to GPUs. I like it a lot, because the times can be referred to both the employee to calculate and to develop the application
1. If my code is linear and I haven't started parallelizing it yet, I shouldn't even consider a GPU solution. GPUs always offer more cores than CPUs, but always with fewer features per core (although there are so many, the performance is spectacular) and less memory. That means that I still have a lot of room in the CPUs and a lot of performance to get out of them and that, generation after generation, they will always give me better performance, because the cores are becoming more and faster. effective in Gflops.
We are in the case of the upper left quadrant: if my process is not yet parallel or if it is on a small scale, it is better to focus on optimizing it. Also if it takes hours to a few days, probably buying faster hardware will get me less execution time and save me the human costs of migrating to other platforms.
2…. If my code takes anywhere from a few days to weeks to run, you've probably already tried to parallelize it, at least on one node (Open MP) and to take advantage of the maximum number of cores you have available per server. In this case, we have two options:
3. Lower half of the image. If our calculation takes weeks to months, we really need a lot of effort. The solution comes hand in hand with technologies such as OpenACC or CUDA with tools such as CAPS HMPP Workbench, that is, mixed CPU-GPU programming, where we make the most of the power of the system or multi-programming. GPU. In this scenario, the move to GPU becomes essential.
Well, let's imagine that we have already decided to migrate to GPU. Basically we are clear that our x86 code is not going to run on any GPU (at least without tweaking), because only the base processor directly runs x86 calculations.
Before going into the matter, I will say that in SIE whenever we think of scientific calculation, we are talking about the Linux operating system, although there are some applications also available in Windows, it is the environment with the highest performance and the most stable. If we go to GPUs for this reason, it seems logical to use the most optimal system.
Let's start, before going into the technologies, to talk about the tools that are available for development on GPU. We will start talking from the most difficult to the easiest:
Promoted by companies such as Caps, Cray Corporation, Nvdia or Porland Group, an open source standard emerges, which includes a series of common directives, which allow working on any of the platforms and what is more important Looking to the future, do mixed CPU-GPU programming, which is supposed to be the new paradigm in the HPCC (high performance Computing Calculus) world. The only drawback is its recent launch, since it is a platform that is not yet mature.
Once this is assumed, we have three possibilities or technologies that are offered to us (I list them from the least implemented for calculation to the most implemented):
Absorbed this company by the second manufacturer of PC processors, it manufactures plates that perhaps in terms of benefits are the most superior. What drawbacks does it pose? Well, the big drawback is that ATI has not made a great effort to release development tools and that means that it is necessary to program in OpenCL and it is complicated. The series intended for calculation is the ATI Radeon HD.
It is true that since AMD has released APUs (hybrids between CPU and GPU) that can be integrated directly on the board, it allows hybrid programming and when the tools are powerful enough for it, it may be an important option to consider. In addition, at the moment there are only uniprocessor systems, which limits the number of cores and memory.
Previously called MIC (multicore), which I deliberately avoid calling it Xeon Phi, so as not to make mistakes with the processors
Intel Xeon motherboard, whose power per core is much higher than the 60 cores offered by the Phi. The beauty of this technology is that it is not necessary to rewrite the application. Being x86 compatible, according to Intel with the Parallel Studio and Cluster Studio tools, applications can be migrated automatically. The problems are three:
We will start by clarifying that NVDIA is a commercial house and that CUDA is its de facto standard in the market, but proprietary. It is currently the most mature technology, since it is going through the CUDA 6 version and since the CUDA 1 version, although it has been improving, it has maintained compatibility.
The main advantage it has are three:
Therefore, and to summarize, if your application is migrated to CUDA or the tool you use to develop it is, it is a perfect option and you will be able to obtain with the latest models a performance improvement of between 5 and 50 times, depending on your application.
Finally, we want to draw a conclusion from this entire report. If your research group is not very large, better stay in the CPU environment for now. They will continue for a few years giving more and more performance and therefore you will be able to continue taking advantage of the improvements, but think that sooner or later they will incorporate GPUs on the chip and you will have to get used to this programming environment, therefore take advantage of it.
If your application has already been ported to GPUs, take advantage of it. Some applications manage to reduce the calculation time from 28 days to a few hours. With a single machine you can do what an entire cluster could do and with a cluster of GPUs what a supercomputing center did a few years ago.
If you already have high-level (human-friendly) tools for GPU programming, take advantage of this new technology, with tools like Mathlab by Mathworks (more than 200 functions migrated) or Mathematica Wolfram it is relatively simple.
To provide specific data, researchers who already work with applications on GPUs, give us specific data:
Think that GPUs are not only a time saver, but also energy and space savings, which is becoming a very important element.