Hard Disks 101 - explained
In this article, I’ll explain most of the stuff related to hard disks, also known as HDD (Hard Disk Drive). If you are considering buying one or just wishing to learn about them, this should help.
In a separate article, I've explained how to refresh data on a hard disk (BikeGremlin site link).
"Exited and prout to be posing my fist ever particle on LES."
Table Of Contents (T.O.C.):
- Introduction and a bit of history
- Hard Disk working principle
2.1. CMR vs SMR recording technology
- The physical size
- Connection interface standards
4.2. SCSI and SAS
4.5. External drive connection standards
- Hard disk performance basics
- Hard disk technical specs
6.1. Cache memory size
6.2. Platter rotation speed
6.3. Seek time and latency
6.4. Sequential and random data transfer speed
6.5. Durability stats
6.6. Physical characteristics
6.8. Air vs Helium
- Hard disk buying recommendations
1. Introduction and a bit of history
The 10-second attention span people who like to "just skim" articles might want to skip straight to the chapter 2.
Relja BrevityIsNotMyVirtue Novović
There was a time when people used tapes to store large amounts of data. The main downside of using a tape for storage is the difficulty of finding a certain file, without reading the entire tape - (re)winding only gets you so far. It was much like playing a music cassette tape or a gramophone record.
In 1957, IBM built the first commercially available hard disks. HDDs allowed people to read, edit, delete, or add data to any file stored on the drive "right away." This is called "random access" and we take it for granted nowadays, but it was a major breakthrough in storage technology.
Over the following two decades, tape storage fought hard, but eventually lost in the vast majority of categories. There still are use cases where tape storage is the optimal choice, but that is a topic for a separate article.
Of course, the first hard disks were behemoths, used in enterprise server rooms and resembling modern day washing machines more than anything else. Their 24 inch large media was spun using powerful motors. Fun fact: my father and I put one such hard disk motor on our lawn mower, back in the 90s when there was a fuel shortage in my country. That thing is still running nicely at the time of writing, using three-phase 50 Hz 220V power.
An old hard disk motor - 1 horsepower, 3-phase power supply.
...For the life of me, I can’t remember if it was from a Sperry-Univac, or a decade younger Honeywell-Bull system...
Compare that to today’s 3.5" hard drives (and their smaller 2.5" laptop-friendly cousins). We’ve certainly come a long way. Nowadays, Solid State Drives (SSD) have replaced the traditional hard drive as the norm for most desktop computers and laptops. However, for storing large amounts of data, the good old hard disks are pretty much alive and kicking - with some more tech. improvements over the past few years. They are the stars of this article.
2. Hard Disk working principle
I’ve seen programmers who don’t understand these concepts make an application that saves thousands of several-byte-small log files (i.e. very small files), and then get surprised to see the application take a huge amount of storage space. So, despite countless super-fast online programming courses, it helps to put some effort into understanding the hardware your stuff will be running on.
Relja Novović - shower thoughts
Let me briefly touch upon how these things work. A minute spent on the following text and pics should help you more easily understand hard disk technical specs. Ready?
Hard disks, as their name says, have discs covered in magnetic material which can be magnetised to north (N) or south (S) polarity. These discs are also called platters. There can be two or more platters in a hard disk drive (modern 3.5" drives can have up to five platters).
Similarly to a gramophone, the platters spin, and there are heads that can move across their surface to perform reading or writing.
Thanks to the damn hipsters, I hope that even the younger readers know what a gramophone is…
Hard disk drive's basic parts drawing
In the picture above, the DISK spins at over 5,000 rpm, while the R/W HEAD moves left-right, floating just about 10nm above its surface (and reading or writing data to it by using electromagnetic induction). When a hard disk drive is turned off, its heads are locked in place so they don’t hit the platters if the drive is transported. This is called parking the heads.
This part serves to introduce you to the technical terms you may hear: cylinders, tracks, and sectors. Alas, when you format a hard disk drive, you are logically dividing each platter into concentric tracks (imagine race tracks at the Olympics), and dividing each track into sectors (smaller sections along the track).
Every sector contains a set number of bytes (groups of eight digits that can only be a logical one or a logical zero). It’s usually 512 bytes on most file systems. Several sectors are grouped into a cluster, the smallest amount of disk’s storage space that can be taken by one file. If your cluster is, say 4096 bytes size, and you save a file that is only 1024 bytes large, it will still take up the entire cluster. However, if a file is 8192 bytes large, it will span across two 4094-byte clusters.
Before I totally confuse you by explaining what a cylinder is, a picture will explain it better:
Hard disk's logical division: cylinders, tracks and sectors
Image source: Wikipedia
So, now you clearly understand what a cylinder is: a set of tracks on the same distance from the platter’s centre, across all the platters in the drive. Cylinders are no longer used for addressing hard disk storage - they have been replaced with logical block addressing (LBA), but that's a topic for a separate article.
Here's what that looks like in real life:
Hard disk internals photo
Well, I hope you are now smarter than you were this morning. Let us move on to the more “flashy” stuff.
2.1. CMR vs SMR recording technology
This one is rather important, especially if you value performance and reliability.
Remember what we’ve just said about tracks and how those are created in concentric circles on the platters when you format a hard drive? Each track contains sectors where bits are written and read from. That’s the old style, called CMR (Conventional Magnetic Recording).
Fact: you need more room (more track width) to write a bit to a track (i.e. polarize its magnetisation) than you need to read from it. That is why some genius thought it was a jolly good idea to have the tracks overlap! That is how SMR (Shingled Magnetic Recording) drives were born.
CMR vs SMR system of writing and reading data from a hard disk's platter
SMR lets you write about 25% more data on a given platter size, which lets you build larger capacity drives at a lower cost. What could possibly go wrong?
Well, SMR relies on the fact that the head for reading can work with a narrower track. But when you need to write to a track, you will also rewrite the adjacent tracks. So, if you write data on an SMR drive track, and its adjacent tracks have some data, you would need to first read and rewrite that data too, in order to prevent data corruption. This makes SMR drives perform awfully poorly when writing if they are loaded with data.
Some manufacturers don't disclose when their drives are built using the SMR system. Make sure to double check and avoid buying such drives - at least that's what I do and recommend.
3. The physical size
3.5 inches is quite large.
Today, hard disks come in two formats: 2.5" and 3.5".
2.5" are the smaller ones with platters that spin slower, make less noise, heat and use less power. They are designed for use in laptops and portable (external) hard drive cases, and a part of this design is a more rugged mechanism of parking drive heads to avoid any damage in case of bumps or similar. Thanks to their lower power consumption, they can use a USB cable for power supply (handy for portable drives). An exception are server-grade 2.5" drives built to be compact, but work 24/7 in servers. Either option is more expensive per TB of storage space compared to 3.5" drives.
3.5" is the usual size for drives designed to be placed in desktop computers and many servers and NAS drives. These are generally faster, louder, require more power, but offer higher storage capacities for the price.
4. Connection interface standards
The nicest thing about standards is that there are so many of them to choose from.
Gotta plug it in somehow, right? Let us go over some of the most widely used standards, starting with a couple of obsolete ones. Note that the given data transfer speed info is related to the connection's capacity, it doesn't mean every (or any compatible) drive can reach that speed.
An obsolete standard. ATA (Advanced Technology Attachment) is the standardized name for what was also called PATA (Parallel ATA) and IDE (Integrated Drive Electronics). It uses a parallel connection, meaning several bits are transferred all at once. That is why ATA cables are flat and very wide.
ATA data transfer speeds go up to 133 MB/s.
4.2. SCSI and SAS
SCSI (pronounced as “scuzzy”) stands for “Small Computer System Interface.” The standard started gaining popularity during the ‘80s (20th century) as a fast, high-quality standard for enterprise storage, because it allowed a connection of a relatively large number of drives, often placed in RAID (Redundant Array of Inexpensive Disks).
It started as a high-speed parallel connection, but got updated over the years. Its latest version uses a serial connection. It’s called “Modern Serial Attached SCSI” (SAS) and the latest models reach data transfer speeds of up to 1,875 MB/s. This stuff is (and always has been) expensive, and can be found in servers.
SAS interface can also accept SATA drives (though SATA interface won’t accept any SAS drives).
SATA (Serial Advanced Technology Attachment). This is the de facto current desktop and laptop storage connection standard. It uses a serial connection at a high frequency (and speed), and allows for thin and long (up to one metre) cables.
SATA data transfer speeds go up to 600 MB/s (the latest, SATA III standard).
This interface is fast enough for most consumer-grade hard drives. If you are buying a HDD today, and reading this, you are most likely looking for a SATA HDD.
NVMe (Non-volatile Memory Express) was originally designed to work with SSDs (Solid State Drives). Unlike other storage media interfaces, NVMe lets you connect your drives directly to the CPU, using the PCIe interface. This allows for very low latencies and huge data transfer speeds of over 10,000 MB/s over PCIe 5.0.
At the time of writing, it is mostly used for SSDs (using M.2 connection format), though some companies, like Seagate, are working on building a HDD that natively connects via NVMe.
If you’d like to know more, Western Digital’s blog has a great article on NVMe.
4.5. External drive connection standards
Let’s do a quick overview of the connection standards for external drives (portable drives, NAS expansion units, and similar).
A standard launched by Apple, which has now become obsolete for reasons beyond the scope of this article. FireWire 800 standard could reach data transfer speeds of up to 100 MB/s.
Current standard for connecting many external 2.5” drives is USB 3.1 with data transfer speeds of up to 1,250 MB/s, while the newer USB 3.2 can go up to 2,500 MB/s (older USB 1.0 and 2.0 standards are now practically obsolete).
Even newer USB4 standard comes in 20 Gbps (that is giga-bits, not giga-bytes per second) and 40 Gbps (5,000 MB/s - that is mega-bytes per second) variants (the 40 Gbps version requires the more expensive 40 Gbps cables).
There should also be a USB4 v 2.0 standard with speeds of 80 Gbps (10,000 MB/s), but I have never seen any appliances or even cables for that one.
Note, there is a USB C connector (physical connector) standard that often gets confused with the transfer speed standards of USB 3.1, 3.2, USB4 etc, but you technically could have the older, slower USB 3.1 working over a USB C physical connector.
USB is capable of supplying power over the same cable. USB A connector cables can supply up to 7.5W, while USB C connector cables can supply up to 100W or even 240W with the latest USB-C PD connector standard.
This standard requires that your motherboard supports it (which is not as commonly the case as it is with USB 3.2 nowadays). It offers speeds of up to 40 Gbit/s (5,000 MB/s).
It connects by combining the PCIe and DP (DisplayPort) signals into two serial signals, and also provides the needed power supply of up to 100W - all using the same cable.
In practice, Thunderbolt connectors and equipment are not very widely used, at least in my experience.
Another standard that is getting obsolete (though Synology uses it for connecting some of their NAS expansion units). It doesn’t supply any power (so you need a separate power supply for the eSATA drives), though there is the eSATAp standard which does.
In its latest & greatest version, eSATA standard provides data transfer speeds of up to 600 MB/s.
I think this covers the standards you are most likely to encounter. Let us move on.
Connector types overview
5. Hard disk performance basics
A child of five would understand this. Send someone to fetch a child of five.
Hard disks use spinning platters to store data. A read/write head moves up-down over the platter surface (see picture 2 for details). That is why they can read or write a lot faster if they are doing it all in one go. That is called sequential access reading/writing. If, on the other hand, they need to read a file (or files) dispersed over several non-adjacent sectors, that kind of reading/writing, called random access, is much slower.
Likewise, because reading only requires to, well, read the magnetic polarity of bits, while writing requires actual magnetisation of each bit, hard disk read speeds are a lot faster than their write speeds.
So, even if you buy the latest, greatest and fastest hard disk drive in the world, you should expect its random access speeds to be lower than its sequential access speeds, and its writing speeds to be lower than its reading speeds.
Manufacturers often state the most “optimistic” scenarios in their selling brochures, so it’s good to keep this in mind and double-check.
6. Hard disk technical specs
There is no amount of memory that human stupidity can’t fill up.
Živojin Novović, when we discussed our home PC’s hard disk upgrade
In this “chapter,” I will cover the basic hard disk drive technical specifications, explain what each one means, how it affects the overall performance, and what its ballpark values are. I’ll use SATA drives as examples, since those are the most popular consumer and “prosumer” drives at the time of writing.
For brevity, I will not discuss every existing aspect of hard drives, but concentrate on the stuff you can quickly check and compare when deciding which drive to purchase.
6.1. Cache memory size
I have already explained what caching is (the context of that article was about website page caching, but the principle is the same). It boils down to using fast storage, such as RAM (Random Access Memory), to quickly store and retrieve data.
For a hard disk, this means: when you want to save a file, you just send it to the drive’s cache and call it a day, letting the drive worry about writing it to its platters for a permanent storage.
A similar thing happens when you are reading data from the drive. Say you open the document “pigeon-deliveries-june.doc”. Drives have smart controllers that can predict the likelihood of you requesting a file such as "pigeon-deliveries-july.doc" next, letting them take the time to read that file and load it into the drives cache, so you can retrieve it quickly if you decide to.
Caching, especially for reading files, is a hit-and-miss, but it can often help provide a faster perceived performance - i.e. despite the drive having the same read speed, you will effectively read and write your files faster, thanks to the use of cache.
Of course, cache memory is a lot more expensive than platter storage space, so even 1,048,576MB (about 1TB) large drives often come with only 64 MB of cache. That is why cache can’t help when reading or writing large amounts of data. Having said that, since hard disks are the slowest when writing, and most users write (save) a lot of smaller files at irregular intervals, cache helps a lot in practice.
Cache size usually depends on the drive's total storage capacity (larger drives have more cache memory). Typical cache size ranges from 32MB for lower end 1TB drives, to 256MB for larger capacity or high-end drives. More cache costs more, but manufacturers usually provide the optimal ratio based on drive’s capacity and intended use.
6.2. Platter rotation speed
The faster drive’s platters spin, the faster you can read/write data (sequential read/write), and the less time it takes for the sector with the needed data to reach the reading/writing head (for random access). However, higher rotation speed also requires more power, and creates more heat and noise.
Typical speeds are 5,400 and 7,200 RPM, though some enterprise (SAS) drives spin up to 15,000 RPM.
Because of the extra noise, faster needn’t always be better, it depends on your use case.
6.3. Seek time and latency
We know that read/write heads move across the platters. This takes time. Moving from one track to the next is a lot faster than moving from the outermost all the way to the innermost track. So, we have:
- Track-to-track seek time: often below 2ms.
- Average seek time: often about 5ms.
For some reason, seek time specs are next to impossible to get for most modern drives.
Now, since the platters are spinning, it might take some time for the sector you need to come under the reading head. Average time taken for this to happen is called the average rotational latency. For a drive that spins at 5,400 RPM, that is 60 (seconds) divided by 5,400, divided by 2 (to get the average). So, about 5.55ms. This goes down to 4.16ms for 7,200 RPM drives, and as low as 2ms for the 15,000 RPM drives.
Do not mix this rotational latency with the average read/write latency, which is:
- how long it takes for the drive's controller to figure out where the required block of data is on the platters,
- move the read/write head to it,
- and start data read/write process (transfer).
Average read latencies often go to about 30ms (and max. latencies can be as high as up to one second), while average write latencies can be ten times longer, so about 300ms.
These stats affect random read/write performance, but don’t affect sequential read/write performance.
6.4. Sequential and random data transfer speed
Data transfer speed provided by manufacturers is basically average sequential read/write speed. It is often around 200 MB/s for modern drives. Drives that spin faster usually have a bit higher "data transfer speed" (i.e. average sequential read/write data transfer rate).
Random read/write performance gives data transfer speeds that are about 1% (one percent) of the sequential read/write speeds (so about 2 MB/s). These are often expressed in IOPS (Inputs and Outputs Per Second). In another article I've explained IOPS in more detail.
IOPS results differ based on the size of data blocks being read or written, but let's say that ballpark values are in hundreds of IOPS, and that with modern drives, write performance gives more IOPS thanks to the good use of caching. When a disk reads, it needs to read the given files wherever they are, often doing a lot of back-and-forth. When it writes, the data is stored in cache and then written in a more optimized way (less "random" write head movement).
6.5. Durability stats
These are the stats related to durability. Note that these stats have more to do with the drive’s intended use than its actual quality (I’ll go to recommend models later on).
Power on hours per year
8,760 is for drives designed to work 24/7 (NAS and enterprise/server drives).
2,400 is for drives intended for personal or business computers (about 8 hours of daily use).
Unrecoverable read errors per bits read (URE)
This is basically the probability of a drive not being able to read a piece of written data (even with error correction attempted). 1 bit errror in 1014 bits read is the usual number for consumer grade drives, while 1 in 1015 or 1016 is more common for NAS and enterprise/server grade drives.
In theory, 1 in 10 14 means there will be an error for every ~ 12 TB of data written, but as I said at the start of this “chapter,” this info has more to do with a drive’s intended use and manufacturer’s warranty, than it does with the actual durability and reliability of the drive.
Workload Rate Limit (WRL)
How many terabytes (TB) per year you can read or write to the drive. Consumer grade stuff is good for about 50TB, while the enterprise stuff is rated at 150TB per year or more.
Mean Time Between Failures (MTBF)
The arithmetic mean (average) of time between failures. Usually expressed in hours, and ranges around one million. Based on my knowledge and experience, this metric can be disregarded.
A stat that is actually useful (but it is not provided by drive manufacturers) is the Average Failure Rate (AFR). Unfortunately, to get this data, you would need a ton of input from service centres around the world. Or rely on Backblaze quarterly stats (Backblaze 2023 Q2 stats link) if they have used a drive that you are looking to buy.
6.6. Physical characteristics
Apart from the size (form factor) explained in chapter 3, these are some other important disk drive physical characteristics.
How many watts the drive consumes when working on average (from about 5W to 10W or more for the larger enterprise grade drives). Faster spinning drives consume more power. Air filled drives (as opposed to helium filled ones) consume more power, since air creates more drag.
Drives make noise when they are spinning, and even more noise when they are reading or writing. This noise level is expressed in A-weighted decibels (dBA). dB scale is logarithmic, meaning a 10 dB increase means 10 times louder, while a 20 dB increase means 100 times louder (10 times 10).
For example, 20 dBA is whisper, 60 dBA is normal conversation, and 100 dBA is a loud motorcycle.
Most drives are below 30 dBA when idling, and above 30 dBA when reading. But even a couple of dBA difference is noticeable, as explained in the previous paragraph. 20 dBA idle and 25 dBA when reading/writing is considered to be very quiet for hard drives.
Modern hard disk storage capacity is expressed in terabytes (TB). TB is a thousand megabytes (MB). Some smaller models have sizes of 500 MB or similar, but those are getting obsolete. Modern drives range from one to a dozen or more TB.
Larger capacity drives often have more platters, consume more power and cost more.
6.8. Air vs Helium
For decades, hard disks had a small hole (with a dust filter) to let the air move in and out (as the inside air temperature changes). Read/write heads float at about a dozen nanometres above the platters. If the air becomes too thin (at high altitudes), air-filled drives will not work properly, their heads will drop and scratch the platters.
Helium, unlike air, is more stable and provides a lot less drag as the drive platters spin (and its read/write head moves).
The problem with helium is that its molecules are so tiny, that it is very difficult to keep it from seeping out of the hard drive’s case. Some extraordinary high tech. advancements over the past decade have allowed manufacturers to seal their hard disk drive units well enough for the damn gas to stay inside. And voila, today we have helium filled drives.
Helium filled drives make less noise, use less power (all thanks to the lower drag), and, because they are completely sealed, can operate safely in low pressure environments. Making the tight seal costs more, so at the time of writing, only high-capacity drives (10TB or more) come helium filled.
7. Hard disk buying recommendations
I am not rich enough to buy cheap stuff.
English proverb ( if this one doesn't get me banned, nothing will! )
In this article I'll say what I would buy (or have bought). I am not in a position to test dozens of models, but I work with computers for a living, and I like listening to people: service technicians, system administrators (and the good folks of the LowEndSpirit forum ).
If at all possible, for sizes of 2TB or smaller, do not buy a hard disk drive, get an NVMe or at least a SATA SSD. Today, hard disks make sense for 4TB or larger sizes.
For a NVMe SSD, I use and recommend Samsung EVO 970 Plus (Amazon affiliate link). It costs well under $100 for the 2TB version, it is super-reliable and far from slow, even under high loads. Note that it supports PCIe 3 speeds only, but for most use cases that means less heat, and performance that is more than good enough.
PCIe 4 gives a sequential read/write speed boost due to a higher bandwidth, but for general use (i.e. many random read/write operations) it gives next to no benefits (also, PCIe 3 NVMe drives are also pretty fast even for sequential read/write).
At the time of writing I can not recommend a PCIe 4 (or PCIe 5) NVMe drives in good faith. Many models of many manufacturers have problems with heating and reliability - and as the heat rises, you throttle back to PCIe 3 speeds or lower. Samsung has a price premium which is more than worth it for the above-recommended EVO 970 Plus model, but their PCIe4 drives had problems and I'm still not sure how reliable they are despite the firmware updates published after a public outrage.
Which brand (manufacturer)?
I'll start with Toshiba. Their drives are on the noisier side, but I've had great results with them in terms of reliability and durability. Batches that are sold in my country (and Hungary for that matter) seem to have the lowest percentage of "lemons."
Seagate is another decent manufacturer I've had good results with.
Western Digital (WD) is a highly renowned manufacturer, but I have not had very good results with them.
The thing is: I am only one person. Sure, decades of experience, but that still doesn't make for a statistically useful data. Also, many manufacturers make a blunder every now and then. So I would argue that the manufacturer is not nearly as important as the particular model. At the time of buying, see which models perform well and are reliable. If you can normally go and talk to people who deal with warranties of large hardware stores, they will probably be able to tell you which models get the lowest number of returns (though, keep in mind that expensive models get sold a lot less, so them having fewer returns doesn't always mean they are more reliable).
Concrete model recommendations
See the paragraph above to understand the limitations of recommendations like this. With that out of the way, you should also know that most manufacturers have turned to SMR recording technology for their consumer-grade drives. I would recommend you avoid SMR drives. They are not worth the savings.
That makes perfect sense in capitalism: making the low and mid-priced stuff be less durable and less reliable, and practically "forcing" people to buy the more expensive stuff.
For larger storage capacities, I would recommend Seagate IronWolf 12TB NAS (Amazon affiliate link - 'cause yachts won't pay for themselves! - though for the US folks, I would recommend buying directly from the manufacturer to get a better and longer warranty protection) or higher capacity IronWolf models if you need more storage.
If you wish to save some money or don't need that much storage space, you could go with a smaller, Seagate IronWolf 4TB NAS drive (Amazon affiliate link).
Alternatively, if you don't mind a bit more noise, you can't go wrong with Toshiba N-series drives, like Toshiba N300 4TB NAS (Amazon affiliate link).
As with many of my other articles, this one too serves primarily as my own reminder and reference. I will try to keep it up-to-date and correct any errors - my articles aren't written by AI or "content writers," and English is not my native as you have probably figured out by now.
If you have questions, additions and, especially, corrections - please post a comment.
Relja Novović - about the author:
Detailed info about providers whose services I've used:
BikeGremlin web-hosting reviews