Bigger Hard-drive = More Fragile

In one of his podcasts, Steve Gibson of GRC discussed how his program SpinRite works to attempt to recover data from defective hard-drives. He mentioned that modern drives are more susceptible to failures now due to higher data density and smaller bit sizes. This is correct, but some people don’t know what that means, so as always, an analogy is helpful.

Hard-drives store data as magnetic binary bits. As a simplification, lets say that when the drive head writes a 0, it aligns the magnetic bit to positive and to write a 1, it aligns it to negative. However, the magnetic bit isn’t a single atom. Like everything else, it is made up of lots and lots of atoms. Just like how a fridge magnet is made up of millions of tiny atoms, drive bits are made up of lots of atoms that align together to give it a specific magnetic polarity that the drive head can read and interpret as a 0 or a 1.

Now for simplicity, imagine we have an old hard-drive in which each bit on the drive platters is made up of 100 atoms. When you write data to it, the write head tries to align the 100 atoms in the bit to the same polarity (positive or negative). However, this will not always work correctly and some of the atoms may not get aligned correctly (or at all). That is, the bits are either positive or negative which correspond to either a 0 or a 1 (this is a simplification.

In our old, 100–atom/bit drive, if eight atoms are corrupt and incorrectly written, then 8% is bad and 92% is good, so when the head reads the magnetic polarity, it gets a nice, solid, clear reading.

Some time goes by and data sizes go up, so hard-drive manufacturers have to make the drives bigger, but they cannot increase the physical size of the drive. Packing in more bits into the same size means using smaller bits (i.e., higher data density).

Lets say we now have a fancy new drive that is 5x the size of the old one. Instead of using 100 atoms per bit, our new drive has only 20 atoms per bit. As a result, if eight atoms are incorrectly written, then instead of only 8% of the bit being corrupt, it is up to 40%. We still get more correct than incorrect at that rate, but the margin is much smaller (even worse if there are more than eight bad atoms).

Some more time goes by and we get a drive with 10x the size of the first drive (2x the previous one). Instead of 100, or even 20 atoms per bit, it is down to 10. Now when eight atoms are incorrect, it is 80% bad and only 20% good! That means the drive used to be very stable with eight bad atoms per bit, but is now completely unreadable.

The reliability of the drive went down as the data density went up because while there is more data per drive, there is less data per bit which makes it bigger, but less reliable and more prone to corruption and data loss.