GPUSizing for LLM, which GPU(s) do I need?

With all of these openweight and available AI models out there, what kind of GPU do I need to run them on my own infrastructure? In this blogpost I’m going to explain how to calulate GPU requirement for an AI model. So, how do we calculate the size? The easiest way, rule of thumb, is…

Magnar Johnsen

14. Feb 2025

2–3 minutes

AI, artificial-intelligence, gpu, Nvidia, technology

So, how do we calculate the size? The easiest way, rule of thumb, is the models billion parameters times 2. A 2B (2 Billion parameters) model, requires minimum 4 GB of GPU memory. The advanced calculation requires a little more calculation that I will try to explain here:

GPU Memory = (Parameters * 4 bytes) / (32/Precision) * 1.2

Parameters is how many billion parameters the model has. Most models has this published as part of the name of the model or you can find it in the documentation of the model. For example Mistral-7B has 7Billion parameters. Precision is the amount of bits used per parameter. The higher bits, the more precise the model is, but it will aslo use more GPU Memory. The last 1.2 multiplier is just to account for some overhead (20%). If you try to load a model that requires more GPU memory than you have, it will just crash when you load it. The most common precision is FP16 (FP=Floating Point).

Using the formula for Mistral 7B and and FP16 as precision, the formula looks like this: (7*4)/(32/16)*1.2=16,8GB GPU Memory. You can see that the “rule of thumb” works well here, but when considering the overhead it is a little bit above 16GB. If you try to run this on a NVIDIA GeForce RTX4080 for example, this GPU has 16GB memory, and it may not work on this GPU. You can try to run the model with 8 bit precision, this will result in (7*4)/(32/8)*1.2=8,4GB and may work with this GPU. You can also add multiple GPU on a PC or workstation (up to 4) using NVLink and use the combined memory of all GPU’s to run larger models. But for the really big models this is not enough. This is where datacenter/cloud GPU comes into the picture. Using Enterprise AI infrastructure software and fast networking you can run a model across multiple nodes.

So what happens if you try to run a large model like LLaMA 3.1 405B, how many GPU’s do you need? (405*4)/(32/16)*1.2=972GB. This requires 11xH100 Datacenter GPU’s (80GB memory)

Another example: Deepseek R1. This comes in several sizes, the full model has 671 Billion parameters (671*4)/(32/16)*1.2=1610GB GPU Memory! This equals to 20xH100 GPU’s. You can see how this quickly becomes expensive to run. Also when it comes to energy consumption and environmental footprint this has a big impact. More GPU=More Energy = More environmental footprint.

So I think it is important to choose “as small model and precision as possible” to get the job done. This is both good for economy and environment. This is why sizing is important.

To make it easy for you, I made this GPU Sizing calculator where you can choose a model and precision (or enter your own model), and it will list how many GPU’s you need based on the model. Please try it out here: GPUSizer