Poll: hdd redundancy

vish · February 21

Hello,

I'm gauging deployment strategies and would like to get your opinion on the following

imok · February 21

If stuff fails too quickly, I'm forced to invest more time fixing it.

And time is precious.

I would pay a bit more for redundancy.

somik · February 22

@imok said:
If stuff fails too quickly, I'm forced to invest more time fixing it.

And time is precious.

I would pay a bit more for redundancy.

RAID is not backup. RAID failure = disk change = RAID array rebuilding, which can fail and take the remaining disk with it. So I see RAID as a 2x chance of fail. I prefer full disk backups or data backups.

yoursunny · February 22

Mentally strong people use RAID-0.

havoc · February 22

Depends on what "little more" means, but I'd lean towards less on the basis that I wouldn't really look towards LES class providers for mission critical data

vish · February 22

@havoc said:
Depends on what "little more" means, but I'd lean towards less on the basis that I wouldn't really look towards LES class providers for mission critical data

that was my thought as well. the data I usually deploy on les provider is A-not senstive and B-not critical

that's the point of this poll. it seems folks would prefer some redundancy

somik · February 22

@vish said:

@havoc said:
Depends on what "little more" means, but I'd lean towards less on the basis that I wouldn't really look towards LES class providers for mission critical data

that was my thought as well. the data I usually deploy on les provider is A-not senstive and B-not critical

that's the point of this poll. it seems folks would prefer some redundancy

Well, I guess people who want RAID are still putting mission critical data on LES providers and not keeping any backups... Live and learn I guess?

host_c · February 22

@imok said: I would pay a bit more for redundancy.

Netralex · February 22

Depends on what you're storing, my movie collection is zraid1 with 4 disks, but for more important stuff I would go zraid2 with 5 disks. For big temp storage of stuff I don't care about just no raid at all.

imok · February 22

@somik said:

@imok said:
If stuff fails too quickly, I'm forced to invest more time fixing it.

And time is precious.

I would pay a bit more for redundancy.

RAID is not backup. RAID failure = disk change = RAID array rebuilding, which can fail and take the remaining disk with it. So I see RAID as a 2x chance of fail. I prefer full disk backups or data backups.

I was not talking about backups.

RAID has given me time to react without downtime.

somik · February 22

@imok said:

@somik said:

@imok said:
If stuff fails too quickly, I'm forced to invest more time fixing it.

And time is precious.

I would pay a bit more for redundancy.

RAID is not backup. RAID failure = disk change = RAID array rebuilding, which can fail and take the remaining disk with it. So I see RAID as a 2x chance of fail. I prefer full disk backups or data backups.

I was not talking about backups.

RAID has given me time to react without downtime.

I just have a failover server for mission critical things (like my email) so "when" my email server fails, i'll just route all email traffic to my standby server

yoursunny · February 23

@somik said:
Well, I guess people who want RAID are still putting mission critical data on LES providers and not keeping any backups... Live and learn I guess?

Google AI stated that you are the parent of 3.73PB raw storage in three JBOD chassis, populated with hundreds of children 22TB each, running ZFS on 256GB DDR5 ECC, filled to the brim after trickling content at 50KB/s for a decade.
So what RAID level did you use on this fantastic collection as big as Library of Alexandria?

slowservers · February 23

I went back and forth with this a lot with Slow Servers. I host on OpenBSD, so a lot of the decisions are based on that.

OpenBSD doesn't support hotswap, at least on SATA. This certainly lowers the appeal of RAID 1.

Also, RAID 1 with 4K sector drives, under OpenBSD, makes filesystem corruption more likely in the event of a crash or power loss. This is a big deal for me.

Power usage is higher, of course, and it doesn't replace backups.

That said! RAID can work very reliably. ZFS, under FreeBSD anyway, works wonderfully. I've seen Linux softraid work pretty well, though RAID 1 does double your chance of silent data corruption. How much that matters varies.

If I were to do RAID, I'd be tempted to do RAID 1 with 3 or 4 drives, but only if the RAID logic was democratic. Of course this would be very power inefficient.

I think RAID Z2 is quite ideal, maybe with four drives, if using ZFS.

host_c · February 23

@yoursunny said:

@somik said:
Well, I guess people who want RAID are still putting mission critical data on LES providers and not keeping any backups... Live and learn I guess?

Google AI stated that you are the parent of 3.73PB raw storage in three JBOD chassis, populated with hundreds of children 22TB each, running ZFS on 256GB DDR5 ECC, filled to the brim after trickling content at 50KB/s for a decade.
So what RAID level did you use on this fantastic collection as big as Library of Alexandria?

On today prices of DDR5???? wtf???? are you sure it is even ECC??? are you out of your mind?

@slowservers said: I think RAID Z2 is quite ideal, maybe with four drives, if using ZFS.

In a 4 BAY setup, you really do not wish to do parity RAID regardless if it is HW or SW, unless you do only file archiving.

For any type of VM operation on 4 drives, RAID10 ( stripped mirrors in ZFS ), and even that will utterly suck.

WHY?

because your writes are at the speed of N/2, in a 4 bay setup that is the speed of 2 drives. ( in a parrity raid setup is the speed of ~1 drive alone maybe a bit above, but depends on the parity if single or double, either the case, it will suck )
Any performance you wish to make on spinning rust is backed up by sheer number of drives.

So

Raid 10:
4 is the bare minimum
6 will beat 4
8 will beat 6
12 will beat 8
24 will make that 12 bay raid 10 look like a joke, at this point you are way above SSD performance of a raid10 array of4 SSD's.
48 or above raid 10 setups will obliterate the controller bandwidth and can even make the CPU choke in SW raid. ( MDADM will struggle here, ZFS will shine but will eat CPU for breakfasts and HW raid will not care, as it has an ASIC for this )

Parity Raid ( 6,7,z2,z3 )

at 12 drives you should stop using 1 volume/pool/span, as rebuilds will start to take days if drives are above 14TB regardless if it is HW or SW raid.

at 24 drives you will start doing raid 60 or stripped raidz2

2x12 - yet if drives are larger then 22TB, I highly recommend 3x8

at 48 drives - now this get's interesting, as you should go for raid 10 in this setup ( assuming drives are larger then 14 TB each ). but if you still wish to do parity raid, 6×8 is what I would do.

At 20TB+ drives, you must design around rebuild risk, not usable percentage as you already have capacity, you care more about rebuild times. The max you will do during a rebuild while you also have running services on the array is ~120-160 MB/sec, yet I would cap that much lower, so you can actually access and use the array while it is in degraded mode.

I know I went a bit off, but just wished to share some problems that occur in larger numbers of drives and capacities.

As a general rule of spinning rust:

Performance scales with spindle count. the more the better
Parity RAID reduces effective write performance. ( it has to do some math, double parity has to do it twice )
Mirrors preserve latency but halve usable capacity. ( raid10 / stripped mirrors )
Fragmentation over time makes things worse, a lot worse.

NOTE:

Once we moved beyond the 2TB era, rebuild times increased significantly. With modern 14TB–24TB disks, rebuilds can take many hours and sometimes days depending on load and array size.

During a rebuild:

Remaining drives are under sustained heavy I/O.
The array runs in degraded mode.
The probability of encountering an unrecoverable read error (URE) increases.
The stress on the remaining disks is at its highest point.

In a RAID5 / RAIDZ1 setup, a second drive failure during rebuild means total array loss. ( and you have a general 50% chance of a next drive failure in the first 24 hours of the rebuild )

There is no practical recovery path from a true dual failure in RAID5 or RAIDZ1, especially in ZFS.

PS:

@vish

"pay a little more and have some redundancy like raid5 or raidz1" - RAID 5 is not raid anymore, please don't use it unless you do archiving or the dataset is not crucial to you if it fails during a rebuild.

vish · February 24

@host_c great read as always.

my target audience here is the low end, what you are describing is 100% true and top of the line like a mercedes but i'm just driving a rust bucket sir

host_c · February 24

@vish said:
@host_c great read as always.

my target audience here is the low end, what you are describing is 100% true and top of the line like a mercedes but i'm just driving a rust bucket sir

I like you

If it makes you any better, ~1 decade ago I started with storage systems for backup to customers. ( as by that time I left the Corporate world )

My first "storage" server that had ECC Memory was a HP Micro gen 8 shit-box with a Celeron G1610T CPU running Freenas 8.something with 16 GB RAM ( DDR3 ECC Unbuffered UDIMM ). That shit box cost me an arm and a leg at that time.

Running RaidZ1 on 4x 2TB drives at that time, man, what a joke if I think back. Then gradually moved my way up to 8 Bay LFF servers and the 12 bay, at this point I heavily started using 12 Disk Shelves ( MD1200 from dell and heck if I recall what from Netapp - the later is a totally different type of animal ).

I lost data more time then I can remember with Raid5/RaidZ1 because of a second drive fail during a rebuild. So I learned my lesson the hard-way. ( I did have a second backup so customers lost nothing as at least I head Replication, but the amount of hours and stress to rebuild thei'r storage back was a fuking pain in the but )

I wish you all the best, Storage is lovely if you take this route, hard as fuck as what works on 4TB drives does not apply to 14 TB drives and what works fine with SW raid will utterly suck in large deployments compared to HW raid.

There is no magic formula for RAID, it totally depends on what that will be used for, the IO load, the Speed that you need, number of disk, raid layout, drive types, link speed to the server, transfer protocol used and a few more.

If you need any bad moments from my experience, I am your man .

That reminds me, you know the probability of a multi drive fail in a raid 10 array of 8 drives that have different brand/model/lot drives is statistically 0, well, do I have stories to tell you about that, and a very recent one, 6 out of 8 left the "chat" during a rebuild.

I also drive a Mercedes by the way, that is 100% Renault. It is called Mercedes Citan that is basically a god damn Renault Kangaroo XL.

yoursunny · February 24

@host_c said:

@yoursunny said:

@somik said:
Well, I guess people who want RAID are still putting mission critical data on LES providers and not keeping any backups... Live and learn I guess?

Google AI stated that you are the parent of 3.73PB raw storage in three JBOD chassis, populated with hundreds of children 22TB each, running ZFS on 256GB DDR5 ECC, filled to the brim after trickling content at 50KB/s for a decade.
So what RAID level did you use on this fantastic collection as big as Library of Alexandria?

On today prices of DDR5???? wtf???? are you sure it is even ECC??? are you out of your mind?

Google AI stated that @somik built his nursery over a decade.
All the HDD and ECC were acquired from retired data center gear pulled from AWS.
Since he lives in Southern India, the biggest cost is importing fees charged by Indian customs.

We are jealous because we have fewer than 4TB usable storage worldwide.
We even counted the flash memory inside our humidity sensor.

somik · 2:19PM

@yoursunny said:

@somik said:
Well, I guess people who want RAID are still putting mission critical data on LES providers and not keeping any backups... Live and learn I guess?

Google AI stated that you are the parent of 3.73PB raw storage in three JBOD chassis, populated with hundreds of children 22TB each, running ZFS on 256GB DDR5 ECC, filled to the brim after trickling content at 50KB/s for a decade.
So what RAID level did you use on this fantastic collection as big as Library of Alexandria?

Eh... Google AI is smoking some strong stuff...

Also, my server is running only 128GB of ECC DDR4 RAM. As for storage, I don't think it's even 20GB yet...

@yoursunny said:

@host_c said:

@yoursunny said:

@somik said:
Well, I guess people who want RAID are still putting mission critical data on LES providers and not keeping any backups... Live and learn I guess?

Google AI stated that you are the parent of 3.73PB raw storage in three JBOD chassis, populated with hundreds of children 22TB each, running ZFS on 256GB DDR5 ECC, filled to the brim after trickling content at 50KB/s for a decade.
So what RAID level did you use on this fantastic collection as big as Library of Alexandria?

On today prices of DDR5???? wtf???? are you sure it is even ECC??? are you out of your mind?

Google AI stated that @somik built his nursery over a decade.
All the HDD and ECC were acquired from retired data center gear pulled from AWS.
Since he lives in Southern India, the biggest cost is importing fees charged by Indian customs.

We are jealous because we have fewer than 4TB usable storage worldwide.
We even counted the flash memory inside our humidity sensor.

I am living in Singapore. Biggest cost is still import costs as Singapore gov has a 9% GST on all purchases made, even those on aliepress.

host_c · 2:56PM

@somik said: I am living in Singapore. Biggest cost is still import costs as Singapore gov has a 9% GST on all purchases made, even those on aliepress.

Still better the Import TAX + 21% VAT at the second it enters the country.

So, it can be worse for some on the other part of the globe, don't forget that

kako1talk · 5:29PM

Voted for no redundancy (pay a little less).

Rule number one of the LE world: RAID is not a backup. I've seen too many budget providers lose entire RAID arrays due to controller failures or user error.

I'd much rather buy two cheap, non-redundant storage boxes from different providers and just sync them myself via rsync or rclone. It gives me actual geographical and provider-level redundancy, which is way more reliable than trusting a budget RAID5 setup.

Poll: hdd redundancy

Comments