ZFS RAID10 + cache?
I've just installed a Proxmox node and I have 4x SSDs + 1 NVMe
I've set up the SSDs with ZFS RAID10 and I'm not really sure what to do with the NVMe.
IA says "It can be used as a ZFS cache (L2ARC) if you are going to work with ZFS."
In your experience, will that ZFS cache make things faster? I have 128GB RAM BTW.
Comments
Nop, For a NVME to be used as cache, it has to be a little more special then an ordinary NVME
I would just use it as a NVME for storage of VPS that have backup in case it fails, or for non critical VM for short.
ZFS loves RAM, as it uses that for cache, RAM is low latency, compared to that NVME is not, so it makes no sense to do that. You will loose more than you will gain.
As you have 128 gb ram, and raid 10 setup, I would also lilit ARC to 16 gb at most, this way you will have a good balance of performance ( raid 10 ) and will not burn too much ram for ZFS, so you have 100gb + for VM ram.
Please do not leave ARK for ZFS in proxmox to auto/dynamic, it works not as you expect ( shit is the right word )
Other than this, Congratulations of choosing the most correct way to use a 4 drive setup
Cheers!
Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462
"If there is no struggle there is no progress"
@host_c always helpful. thank you. I always forget to limit the ARC.
What a beast server congrats imok
I believe in good luck. Harder that I work ,luckier i get.
If you are asking this here, dont worry about cache, with 4x ssd should be enough speed, just make sure with all vps running you have still plenty of ram available, proxmox 8.1+ with use 10% for arc usage, but you can always modify this to use more. If you still decide to use nvme as L2 arc do yourself a favor and use (2) in mirror
Offshore Hosting & High Privacy in Panama
why does the NVMe need to be Special and how special?
I bench YABS 24/7/365 unless it's a leap year.
TBW (Total Bytes Written) — and keep in mind, everyone lies in specs.
Latency — this is the real killer.
If you use a consumer-grade NVMe like the Samsung EVO (or any "Pro"-branded drive), it will wear out quickly — especially depending on your read/write I/O patterns. Writes, in particular, will degrade it fast.
Even if it's a Gen4 NVMe capable of sustaining 5–8 GB/s, it’s still a poor choice — because RAM always wins in terms of latency.
Using a data center-grade NVMe (like Intel DC P-series or similar) is overkill. Just use more RAM.
In practically every situation, what ZFS truly needs is RAM, not an NVMe cache. NVMe cache and similar options are mostly "marketing-driven features" requested by the community. From a performance standpoint, they offer minimal real-world benefit and only add complexity to the setup.
L2ARC (NVMe used as read cache) and ZIL/SLOG (used for synchronous write logging) can offer benefits in very specific workloads — like NFS or databases, and here comes the "but" : my reply here is taking into consideration the following setup:
Unless you wish to achieve 40 GBPS and above over whatever protocol, just use RAM.
ZFS was fundamentally designed to benefit from low-latency cache — and that means RAM. Everything else is just improvisation.
As a general rule — not just for storage but in many areas — latency is the real killer. Latency always beats raw read/write throughput.
You’re much better off with storage that has sub-1ms latency and delivers 50 MB/s than something with 3–4ms latency but 500 MB/s. The system feels snappier, more responsive, and performs better in real-world scenarios.
Even if you plan on mostly sequential reads and writes, give it a few months — fragmentation sets in, workloads become more random, and suddenly you’re dealing with lots of random I/O. At that point, low latency becomes even more critical.
PS: after you figure the storage part, the other important thing to consider is 'Delivery" of that storage to destination, as that is as important.
I loved Fiber Channel ( the protocol ), why? because it was built for Storage ( altho Infiniband is definitely worth considering to look into and RDMA is also good )
We have 8/16GBPS FC setups running today ( 10+ years old ) that can punch performance as close to NVME and that performance is delivered to 3-6 vMware Nodes. ( still running 5.X or 6.X). - why not upgarde? as there is absolutely no need for the customer to do so.
I GPT-ed here a compassion between FC and whatever over ethernet:
When to Use What?
Use Fibre Channel when:
You already have an FC SAN.
You need rock-solid performance for transactional workloads.
You have skilled staff and budget for it.
Use Ethernet when:
You prefer flexibility and convergence (single fabric for data and storage).
You use cloud, hyperconverged, or scale-out solutions.
You want to avoid FC infrastructure costs.
Fibre Channel (FC) vs Ethernet – Core Comparison
Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462
"If there is no struggle there is no progress"
The good old times.
Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462
"If there is no struggle there is no progress"
What da hail you talking about mine still looks this way!!!
Free Hosting at YetiNode | MicroNode| Cryptid Security | URL Shortener | LaunchVPS | ExtraVM | Host-C | In the Node, or Out of the Loop?
Awesome timing. Been wondering about this too for a project.
I suspect with main pool being SSD and thus already pretty fast any sort of caching layer beyond RAM is of limited benefit. So I'm thinking more metadata & small files (vaguely recall that specials can do both at same time). Think metadata on a fast device would improve perceived snappiness. Optane drive for that would be ideal but haven't quite figured out whether that's worth it. If all the metadata is on there then probably need two for risk. At which point it's quite an elaborate & pricey setup for possibly not a huge benefit. Plus that needs more PCIE lanes. idk..
Anybody got a good way to test these sort of ZFS things? A straight speed test won't necessarily reflect the trade-offs well (big files vs say many small). Plus stuff involving cache is hard to test anyway
used to feel better after.
and if i see B then its a sign to change.
since you wish to use parity raid ( Zx ) you will have most to non in gain from whatever cache you wish to do, other then high CPU Clock and RAM. You will have latency penalty by having to do the math on the stripped blocks for Raid Zxx. ( an the math is done by a CPU that has to deal with other stuff also plus all this over a software stack )
A raid 10 of those SSD would kill all Performance wise.
If you wish to counterbalance the penalty of parity raid with ssd, aaa nop, it will not do the trick, been there done that, but feel free to share your findings. Since you will use consumer SSD ( 6 GBPS link ) you will be limited in terms of speed by that + the parity math for blocks stripped to the pool done by a x86 CPU. ( regardles of the type, make and model )
There is no magic way to do high performance raid and sacrifice as few drives as possible for data integrity.
You either do Raid 10
, or stick with Zx, putting the Zx in stripping ( like raid 50,60.. ) will give you some performance boost, but i doubt it will be anything noticeable. ( first the "controller" will have to divide the data blocks then do the parity and the write them / each cycle )
If this is a local storage setup it will be good, if you wish to transport it to nodes ( nfs,cifs ) you will also add the transport problems over it, outcome will be....... well, not what you aspect ( you will hardly saturate a 10 GBPS line after fragmentation even if you use SSD, and all this if you go MTU9000; aaa..... almost forgot, if you do transport of storage via ETH, Nexus, Juniper is your 2'nd and 3'rd option, the first one is Arista. Be prepared to spend some $$ on electricity as those sw consume some power. Also, you will need cards from Chelsio or other PREM manufacturer to go beyond 10G. ( any raid type with any number of spinning rust will saturate a 1G line any day of the week, things get hard passing 5 GBPS ). Jumbo frames helps, but fragmentation, protocol overhead, and latency often always win. - sad but true.
The above is mostly the reason why ZIL, SLOG and other were added to ZFS to counterbalance the flaws/limitations, but in real life they do not work, or the performance gained / USD spent is a joke. ( real life test is a 20+ TB Poll shared to like 50 PCS of IO hungry VM's, like storage VPS customers
)
It is not that ZFS is not good, it is pure math an physics behind how storage works. I know this was not an answer you wished, but again, feel free to experiment yourself. ( just don't break the bank while doing it )
PS:
The holy trinity of storage design—capacity, performance, integrity—can’t all be maxed at once. You choose two, and live with the tradeoffs.
PSS:
The above is the perfect marketing presentation of how awesome it will be, and in real life it will look like this:
and the end user will only see this
Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462
"If there is no struggle there is no progress"
You clearly know a lot about this!
But think you misunderstood me. I'm not thinking cache, leaning more towards metadata/small files on optane. i.e. the data never goes to main pool.
Well given that I did a non-ECC build I think integrity is already looking a touch shaky lol. And yeah, home storage nfs. Probably 2.5gbe so pretty amateur build anyway
@imok - sorry kinda hijacked your thread a bit
Well, since you went non ecc i would say the build is god for saving videos, and i would stop there.
If you give us some details about number of drives, cpu and ram, share use case and protocol used,i can give you what i would do, if you wish to hear me out.
Also @imok is a cool fella, I doubt he would mind having this chat here on he’s tread, is that correct?
Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462
"If there is no struggle there is no progress"
@host_c Thanks!
Use case is various homelab stuff. Relevant one for discussion is probably half a dozen nodes will be using this as s3/object storage backend for k8s hence interest in fast small files and snappiness. NFS too but most of LAN is 2.5gbe so no wild throughput expectations....the base sata pool should saturate that.
Ancient Asus X99 platform / 5960X / 64gb 3200 / 3x S3500 1.6 intel dc SATAs which will be main pool. Boot another smaller s3500. Dual 10 gig eth, but most of network is 2.5 so kinda irrelevant
Have 3x P1600X lying around, so could use one or two of those
Can buy a P4800X and connect it to U.2, but would need to be a single, cause 2 would be more than I'd like to spend
Think I'd need to figure out some sort of small file test and try to test this all somehow. Cause if there isn't a big diff vs sata pool then it's all academic.
No customers sending me angry tickets & no super critical data. I'll build an ECC server in maybe 1.5 years when Desktop is due a refresh & can cannabalize that for a ecc mobo and ecc cpu
No worries, it's interesting to read you guys. Even if I don't understand the most of it 😅
it was actually an information overload but a good one
I bench YABS 24/7/365 unless it's a leap year.
Lucky you, bastard

Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462
"If there is no struggle there is no progress"
haha indeed. (though I do have angry customers in day job that is funding my homelab shenanigans...)
Anyway...no storage testing for now. XMP on this board seems broken. So guess I'm figuring out 30+ memory timings settings by hand
If you ever feel the need for some angry mobers, ping me
Other then that, have fun on your project, just please, don’t break the bank, it does not worth it
Cheers
@imok ,
for being cool on your thread.
Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462
"If there is no struggle there is no progress"
Damn, I am so happy my day job does not require me to interact with customers... Respect goes to you guys who can deal with angry/ignorant people without blowing a fuse!
Never make the same mistake twice. There are so many new ones to make.
It’s OK if you disagree with me. I can’t force you to be right.
Me and Stuart after a day that has no "angry mob" tickets
Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462
"If there is no struggle there is no progress"