Set-up a Multi GPU server from Scratch
This blog is outcome of dirtied hands on multi GPU server from Bhabani Mohapatra and team. A good learning environment will aid you not only learn coding stuffs in AI but also gives proper understanding of hardware beneath it. I am lucky to be part of this team.
This blog guides through details about how to start and how to set-up a fully working multi GPU server from scratch, so fasten your seat belt and have patience because it takes some time and effort to arrive.
We can roughly categories efforts into following steps
1. Get a multi GPU server setup
2. Which GPU cards to be selected
3. UBUNTU server installation
4. Install packages to run AI ML codes
Get a multi GPU server
This is most prominent and crucial one, as one should ponder and decide which server device to be settled with. There are many vendors who supply multi GPU supported high capacity servers, you can go through below links and buy one.
A) Workstation specialist
B) Supermicro
C) Gigabyte
D) Titancomputers
E) Cocolink
When you get one, unboxing will be something like this…
Which GPU card to be selected
AI community prefers NVIDIA GPU cards, latest one when I’m writing this blog is GeForce RTX 3090. Nvidia supports AI libraries and most of the time its the default choice
Place these GPUs on GPU slots in server (as shown in above videos) with proper power cables connected.
UBUNTU server installation
For Multi GPU server, most of the time Ubuntu OS stands as first choice due to its large community support, and being open source makes it economic.
Here are the few links which guides you to install ubuntu to server
Set-up SSD
These server prefers SSD setup, one can hot plug SSDs and configure hard disk in ubuntu.
- To configure in ext4 format
2. To configure in RAID 5
Install packages to run AI ML codes
Verify you have CUDA capable GPU and Install appropriate CUDA and cudnn version. This link will guide you through entire process
Once you have nvidia-smi detecting all your GPUs with proper CUDA and cudnn version, then your GPU server is ready to accelerate AI simulations.
This blog is brief compilation of resources available to build Multi-GPU setup, this helps reader to get server ready without much glitches. If something goes wrong in-between one should not hesitate to start over from beginning.