Disk health with >1TB/day write

zxrlha · November 2021

I have a project which demands tons of CPU power and disk space.
Under current testing, it will write 1~5TB data to disk per day depending the CPU power.
Most of those are overwrite, and thus the data size on disk increase only about 50G per day.

I think this project will need to be run several months.
Currently we are testing it on several different machines, including HDD, SSD, NVMe.

Should I worried about disk durability?
If so, is there any method of checking disk health without root access?

Daniel · November 2021

HDD would be best for this use case.

I don't think an SSD (including an NVMe SSD) is appropriate for this task as flash memory degrades the more you write to it. If you want to use SSDs, check the TBW ("terabytes written") rating. This measures the endurance of the drive - It's the amount of writes it's designed to handle before it's considered to be "worn out" and out of warranty, at which point you're on your own. Cheaper SSDs may only be warrantied for 100TBW whereas higher-end SSDs are thousands of TBW, but in either case 5TB/day is going to quickly wear it out.

Regular HDDs don't have the same issue. The cheap Kimsufi servers have 10-year-old heavily used HDDs in them and they're still working fine.

zxrlha · November 2021

@Daniel said:
HDD would be best for this use case.

I don't think an SSD (including an NVMe SSD) is appropriate for this task as flash memory degrades the more you write to it. If you want to use SSDs, check the TBW ("terabytes written") rating. This measures the endurance of the drive - It's the amount of writes it's designed to handle before it's considered to be "worn out" and out of warranty, at which point you're on your own. Cheaper SSDs may only be warrantied for 100TBW whereas higher-end SSDs are thousands of TBW, but in either case 5TB/day is going to quickly wear it out.

Regular HDDs don't have the same issue. The cheap Kimsufi servers have 10-year-old heavily used HDDs in them and they're still working fine.

I see. The problem with HDD is that disk becomes the bottleneck instead of CPU .
But slow HDD is better than broken SSD.

yoursunny · November 2021

Most of those are overwrite, and thus the data size on disk increase only about 50G per day.

How old is the data being overwritten?
If most overwritten data is from the last few hours, you don't need to commit these data to disk at all.
Instead, write them into DRAM or Optane DIMM, and then commit to disk when the data is unlikely to be changing.

zxrlha · November 2021

@yoursunny said:

Most of those are overwrite, and thus the data size on disk increase only about 50G per day.

How old is the data being overwritten?
If most overwritten data is from the last few hours, you don't need to commit these data to disk at all.
Instead, write them into DRAM or Optane DIMM, and then commit to disk when the data is unlikely to be changing.

Yes, most overwritten data are from the last few hours.
There is one way to store those data in RAM instead of disk.
But previous tests show that one thread will need >8GB RAM, and those machines have only 128GB RAM.
Today I find a machine with 1TB RAM, and I'll test on that.

The method you suggested sounds good, but it requires significant modification of our current program and changing of underlying algorithm.
I'll think how to implement it.

willie · November 2021

Is Chia still a thing? What are you doing really? What kind of server are you going to do this on without root access? If you're not seeking a lot, HDD's intended for this sort of use (surveillance drives made to record video 24/7) might be a good bet.

zxrlha · November 2021

@willie said:
Is Chia still a thing? What are you doing really? What kind of server are you going to do this on without root access? If you're not seeking a lot, HDD's intended for this sort of use (surveillance drives made to record video 24/7) might be a good bet.

Not Chia. There are databases which serves as cache(because RAM is not large enough), and the computation need to iterative over all elements in such databases. In the iteration it will update various elements in other databases, hence it requires seeking.

Currently using: dual Gold5118 with 64G RAM and HDD(RAM and disks are bottleneck),
E5 2640-v4 with 128G RAM and SSD(CPU is the bottleneck).
E5 2640-v4 with 128G RAM and NFS@>2Gbps (NFS speed is good enough, CPU is the bottleneck)

deank · November 2021

If you will be renting a server, use SSD.

If colo (your own hardware), use HDD.

zxrlha · November 2021

@deank said:
If you will be renting a server, use SSD.

If colo (your own hardware), use HDD.

Lol. I don't pay for those hardware. So probably I should use SSD.
At the begining I'm afraid that if SSD broke, I have to start from the begining.
At some point I realize that I can save a snapshot and backup to another machine.

yoursunny · November 2021

@zxrlha said:
There are databases which serves as cache(because RAM is not large enough), and the computation need to iterative over all elements in such databases. In the iteration it will update various elements in other databases, hence it requires seeking.

The actively used portion of your database should fit in RAM.
If it cannot fit in the RAM of a single server, you have an architectural problem, not a hardware problem.
The solution is to design it as a distributed system.
Read the Hadoop papers and you'll see the idea.

zxrlha · November 2021

@yoursunny said:

@zxrlha said:
There are databases which serves as cache(because RAM is not large enough), and the computation need to iterative over all elements in such databases. In the iteration it will update various elements in other databases, hence it requires seeking.

The actively used portion of your database should fit in RAM.
If it cannot fit in the RAM of a single server, you have an architectural problem, not a hardware problem.
The solution is to design it as a distributed system.
Read the Hadoop papers and you'll see the idea.

It is a hybrid of architectural and hardware problem. Current architectural allows thread operates on different database so that avoid database locks. However, it means each thread needs some amount of data, and the machine do not have enough RAM.

I agree that it should be fit into RAM, and thus I'm also testing that way.
Currently I think it is better, as long as I adjust the thread number according to available RAM, rather than CPU cores.

Thanks for the suggestion of Hadoop. I'll look into it.

havoc · November 2021

Probably cheapest to buy an old server frankly.

If you're somewhere with cheap electricity then those are definitely viable

yoursunny · November 2021

@zxrlha said:

@yoursunny said:

@zxrlha said:
There are databases which serves as cache(because RAM is not large enough), and the computation need to iterative over all elements in such databases. In the iteration it will update various elements in other databases, hence it requires seeking.

The actively used portion of your database should fit in RAM.
If it cannot fit in the RAM of a single server, you have an architectural problem, not a hardware problem.
The solution is to design it as a distributed system.
Read the Hadoop papers and you'll see the idea.

It is a hybrid of architectural and hardware problem. Current architectural allows thread operates on different database so that avoid database locks. However, it means each thread needs some amount of data, and the machine do not have enough RAM.

I agree that it should be fit into RAM, and thus I'm also testing that way.
Currently I think it is better, as long as I adjust the thread number according to available RAM, rather than CPU cores.

This is a hardware problem then.
You need to install more RAM per CPU core.

Major cloud provider offers "high CPU" and "high RAM" models.
Basically, you have a "balanced" or "high CPU' model but you actually need a "high RAM" model.

zxrlha · November 2021

@havoc said:
Probably cheapest to buy an old server frankly.

If you're somewhere with cheap electricity then those are definitely viable

I cannot buy anything, and I am sucked with current available hardware(although there are several possible choices).

Micronode · November 2021

If you were to go for say 4x Kingston DC500M 3.84TB SSDs in RAID you may be ok.

These disks have a Drive Writes Per Day of 1.3 meaning that per disk you can write 5TB per day per disk. Now if you're in RAID 10 thats going to get you 10TB per day of writes for the drive lifespan which is usually the warranty period. In the case of the Kingston DC500M thats 5 years.

This isn't guaranteed though, the higher the write workload the higher the risk of drive failure so you definitely want some redundancy.

This being said if you can solve the problem in software by caching hot data you can avoid the need for expensive DC grade hardware although ideally you should do both.

It may be worth looking into the durability of the SSDs you are testing on and check the DWPD, if you need any help to spec up hardware for this application feel free to get in touch

lentro · November 2021

Agreed with @yoursunny, this seams like a great place to use RAM. But then you're in a low end forum, haha
On a budget, I would probably use multiple NVMes for redundancy and then replace them once their flash wears out, as recently I've begun to not be able to stand poor performance.

@Daniel said: Regular HDDs don't have the same issue

I thought HDDs were still prone to writing limits? E.g:
https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/internal-drives/wd-red-hdd/product-brief-western-digital-wd-red-hdd.pdf

It says "Supports up to 180TB/yr" so I thought it meant the HDD isn't meant for continuous reads/writes? It is physical spinning rust after all...

zxrlha · November 2021

@yoursunny said:

@zxrlha said:

@yoursunny said:

@zxrlha said:
There are databases which serves as cache(because RAM is not large enough), and the computation need to iterative over all elements in such databases. In the iteration it will update various elements in other databases, hence it requires seeking.

The actively used portion of your database should fit in RAM.
If it cannot fit in the RAM of a single server, you have an architectural problem, not a hardware problem.
The solution is to design it as a distributed system.
Read the Hadoop papers and you'll see the idea.

It is a hybrid of architectural and hardware problem. Current architectural allows thread operates on different database so that avoid database locks. However, it means each thread needs some amount of data, and the machine do not have enough RAM.

I agree that it should be fit into RAM, and thus I'm also testing that way.
Currently I think it is better, as long as I adjust the thread number according to available RAM, rather than CPU cores.

This is a hardware problem then.
You need to install more RAM per CPU core.

Major cloud provider offers "high CPU" and "high RAM" models.
Basically, you have a "balanced" or "high CPU' model but you actually need a "high RAM" model.

Yes, thanks.
Now I know that I need machines with high enough RAM per core, and high single core performance.

Daniel · November 2021

@lentro said: I thought HDDs were still prone to writing limits?

The limits are far higher for HDDs, at least comparing non-enterprise HDDs to non-enterprise SSDs (keeping in mind this is a low-end forum where not everyone runs enterprise disks). HDDs tend to last longer than the specs suggest. I used to have a Quantum Fireball 650MB IDE hard drive that still worked fine 20 years after manufacture

Writes physically wear out the RAM chips on SSDs, and the wear is more noticeable than on HDDs. HDDs are usually somewhat recoverable if something bad happens to them, whereas SSDs have a tendency to just completely break when they go bad.

Anyways, like others have said, RAM is better than any disk. If you need to write temporary files, /dev/shm is your friend - When it exists (it's an optional kernel feature), it's guaranteed to be a RAM disk using tmpfs, so any files you write to it will be all in memory rather than on disk. On the other hand, /tmp is sometimes tmpfs and sometimes a regular directory on disk.

willie · December 2021

It might help if you describe what actual problem you are trying to solve. Lots of people use databases for things that can be done with old fashioned serial i/o and sorting, or that kind of thing. Particularly if this is an offline application, you might not need all those disk seeks, in which case HDD's are fine. The old saying was "disk is tape", and often tape can in fact do the job.

bibble · December 2021

@Daniel , think you said got the Wishosting 5950X? Would you post a benchmark and recommend the VPS?

vyas · December 2021

@bibble said:
@Daniel , think you said got the Wishosting 5950X? Would you post a benchmark and recommend the VPS?

I believe you are looking for this

https://talk.lowendspirit.com/discussion/comment/75016/#Comment_75016

Posted by @snz

Daniel · December 2021

@bibble said:
@Daniel , think you said got the Wishosting 5950X? Would you post a benchmark and recommend the VPS?

I posted a YABS here: https://talk.lowendspirit.com/discussion/comment/74102#Comment_74102 and a Monster Bench with Asia speed test results here: https://talk.lowendspirit.com/discussion/comment/74108#Comment_74108

I'm getting rid of it, but only because I got a dedicated server during Black Friday and I'm going to move everything onto that. Performance on the Wishosting box was very very good.

Disk health with >1TB/day write

Comments