It is not a surprise for averages techie consumers and anyone working in IT that the storing capacity of storage devices like hard drives (HDD) is not the actual usable capacity you get, if you are one of them or a storage expert/geek, don’t bother reading. 😉
In the past (the 80s - 90s) was simple to calculate the storage capacity required for IT systems; basic operations were required. With the introduction of new technologies (not that new) such as deduplication, compression, and thin provisioning couple of years ago in the storage array world, this calculation became more complicated and tricky to understand. This happens, because not all storage vendors use the same concepts, wording and metrics and not all of them play fair with this sensitive information.
Deduplication has been the technology changing all the rules when sizing for storage capacity, but cautions must be taken, not all the application need it, such as databases, and could cause high CPU consumption depending on one method used (which I won’t mention here). However, is very beneficial when planning for VDI full clones, for example.
The deduplication technology, when coupled with compression, will increase the storage capacity dramatically. These reduction techniques are translated by storage vendors as data reduction ratio in their solutions data-sheets and is going to play an important part in projects and solutions decisions when choosing drives and storage in general. Usually, this data reduction ratio ranges from 5:1 to 10:1, it could be more, depending on what applications are used.
Let’s see the major concepts concerning storage capacity. For the sake of simplicity, lets use the following example.
Example: You have 4 HDD of 2TB each. Step by step lets see what is the maximum capacity you could see on your operating system (OS).
I like to divide the RAW capacity into the two common measure sizes: Decimal (base-10) and binary (base-2). Take into account that persistent storages are used to be measured in decimal, while non-persistent storage such as RAM, is always measured in binary.
Theoretical capacity, the sum-up of the capacity tags of the storage devices (HDD) for almost all storage vendors.
From our example: 2T per HDD, total capacity = 8TB
The actual usable capacity per storage unit that is seen in the end system/device. Many people consider this capacity as part of “usable capacity” (below), and they count it as system overhead. I prefer to keep them separated.
More often than not, the total capacity of your tagged/labeled (decimal) capacity, for the most common storage prefix, is:
- Mega: 95% of the total capacity.
- Giga: 93% of total capacity.
- Tera: 91% of total capacity.
From our example: A 2TB HDD, turns into 1.8TB. And the total capacity of our 4 HDDs, is now equal to 7.2TB
Note: This is the actual capacity that you get if you plug your HDD(s) into a PC. The system will take an extra portion of your capacity after formating the hard drives.
This is the capacity after system overhead. This overhead is used for internal operations, data protection, and other stuff depending on the system/OS. In the case of storage arrays, the overhead is based on RAID configurations, system OS/FW installation, metadata, garbage collection, and others.
From our example: Add a RAID 10 configuration and some system overhead of 5%, total capacity = 3.42TB
Note: RAID 10 will double the performance while cutting down the storage space to half.
This is the minimum capacity that you’ll need to look at for making initial sizing calculations and designs. At this point, you have seen how your precious capacity is being reduced after usual operations; don’t worry, things will get better. [icon type="icon-thumbs-up"]
The capacity that you can actually use after data reduction ratios is applied, depending on the system capabilities. This includes deduplication, compression, and other proprietary techniques from the storage vendors. Take into account that these reduction ratios are usually per application, so not all kind of data will be reduced. To get the total reduction ratio, just multiply compression and deduplication ratio.
From our example: Add a total data reduction ratio of 5:1, and our total capacity is now 17.1TB
Finally, the configured capacity (or provisioned capacity), is the capacity your system will let you configure even if you don’t physically have it. This is thanks to thin provisioning and over-provisioning. I like to call this “honestly lying to the customer”, and is basically what all cloud storage providers do, they offer and sell to you. It is storage that probably they don’t even yet have. In the storage arrays is the same, you are configuring a storage capacity that you don’t have and are presenting this as a real capacity to the front-end application, such as windows OS or hypervisors.
From our example: This is up to you and the storage array capabilities, but from the real usable capacity of 3.42TB that you have, and the 17.1TB effective that you could easily have; a lot more could be configured to the front-end! Let’s keep it simple and say that your system let you configure 30TB. Boom! magic. now you have 30TB out of 3.42TB physical that you could have.
From the bottom to the top
Summarizing, the most important kind of capacities (usually for storage arrays) for our example can be explained as below: