Inconsistent fio results on similar VPSes on the same node

Not_OlesNot_Oles Hosting ProviderContent Writer

I am seeing some inconsistency on fio tests in yabs run today on similar spec VPSes on the same node. Please see three examples below.

I'm unclear on what's happening, whether it's related to any single VPS that I happen to be testing, whether it's related to some other VPS or some node process using high file I/O at certain times, or maybe something else.

I've been watching iotop -b 3 -o a little. So far, no obvious insight.

Ideas? Thanks!

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/vda1):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 193.28 MB/s  (48.3k) | 1.78 GB/s    (27.8k)
Write      | 193.79 MB/s  (48.4k) | 1.79 GB/s    (28.0k)
Total      | 387.07 MB/s  (96.7k) | 3.57 GB/s    (55.9k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 2.12 GB/s     (4.1k) | 2.18 GB/s     (2.1k)
Write      | 2.23 GB/s     (4.3k) | 2.33 GB/s     (2.2k)
Total      | 4.35 GB/s     (8.5k) | 4.51 GB/s     (4.4k)
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/vda1):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 193.53 MB/s  (48.3k) | 1.95 GB/s    (30.4k)
Write      | 194.04 MB/s  (48.5k) | 1.96 GB/s    (30.6k)
Total      | 387.57 MB/s  (96.8k) | 3.91 GB/s    (61.1k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 957.00 KB/s      (1) | 18.49 MB/s      (18)
Write      | 1.12 MB/s        (2) | 20.26 MB/s      (19)
Total      | 2.07 MB/s        (3) | 38.75 MB/s      (37)
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/vda1):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 16.51 MB/s    (4.1k) | 1.84 GB/s    (28.7k)
Write      | 16.52 MB/s    (4.1k) | 1.85 GB/s    (28.9k)
Total      | 33.03 MB/s    (8.2k) | 3.69 GB/s    (57.7k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 2.11 GB/s     (4.1k) | 2.19 GB/s     (2.1k)
Write      | 2.22 GB/s     (4.3k) | 2.34 GB/s     (2.2k)
Total      | 4.34 GB/s     (8.4k) | 4.53 GB/s     (4.4k)

I hope everyone gets the servers they want!

Comments

  • skhronskhron Hosting Provider

    Is this performance drop happens for all I/O related to physical drives at host machine or it is only some virtual machines facing I/O performance drop occasionally? Try checking with iostat.

    Thanked by (1)Not_Oles

    Check our KVM VPS plans in 🇵🇱 Warsaw, Poland and 🇸🇪 Stockholm, Sweden

  • havochavoc OGContent WriterSenpai
    edited September 7

    Do you have access to the node itself?

    I reckon actual flakey hardware issue.

    512k read returning 957.00 KB/s sometimes and 2BG/s others doesn't feel like a noisy neighbor issue. And even within the VPS the numbers don't make sense for #2. More throughput for 64k than 1m?

    Are you sure that even if on same node they're on same storage backend?

    Thanked by (1)Not_Oles
  • @havoc said:
    I reckon actual flakey hardware issue.

    I had a failing hard drive that caused similar issue so i'll included to go with @havoc's suggestion. If you have allocated space for your VMs such that one or two of the VM's data partition falls within the part of the drive with bad blocks, you can have this issue. Do note that this is only applicable if you are on a legacy HDD and not a SSD.

    Never make the same mistake twice. There are so many new ones to make.
    It’s OK if you disagree with me. I can’t force you to be right.

  • Not_OlesNot_Oles Hosting ProviderContent Writer

    @skhron said:
    Is this performance drop happens for all I/O related to physical drives at host machine or it is only some virtual machines facing I/O performance drop occasionally? Try checking with iostat.

    Definitely a good question whether host machine or only hosted virtual machines are affected.

    One of the pretty smart and well experienced users told me this morning that he is seeing SCSI errors on his VPS, for the first time, recently. I haven't been aware of issues on the node itself.

    So I have to look into it some more today.

    Thanks for helping! <3

    I hope everyone gets the servers they want!

  • Not_OlesNot_Oles Hosting ProviderContent Writer
    edited September 7

    @havoc said:
    Do you have access to the node itself?

    Yes.

    I reckon actual flakey hardware issue.

    Sounds right.

    512k read returning 957.00 KB/s sometimes and 2BG/s others doesn't feel like a noisy neighbor issue. And even within the VPS the numbers don't make sense for #2. More throughput for 64k than 1m?

    Are you sure that even if on same node they're on same storage backend?

    Yes, it seems that way. I can see via ssh all the disks on the node. I can see via ssh the hardware RAID controller on the node. I haven't physically seen the machine.

    Thanks for helping! <3

    I hope everyone gets the servers they want!

  • Not_OlesNot_Oles Hosting ProviderContent Writer

    @somik said: one or two of the VM's data partition falls within the part of the drive with bad blocks

    This is another good question!

    There is a hardware RAID controller and 8 spinning rust disks in RAID 10. The RAID controller has tests. The RAID array status is reported as "Optimal."

    I hope everyone gets the servers they want!

  • Not_OlesNot_Oles Hosting ProviderContent Writer

    I guess today I ought to try some organized testing to try to figure out whether the node also is seeing I/O issues or just VMs.

    I ought to look at the logs on the VMs with poor test results. Something tells me I'm not gonna find anything in the VM logs.

    Hello iostat.

    I hope everyone gets the servers they want!

  • havochavoc OGContent WriterSenpai

    Should show up in SMART data i would think. Run on, redo the fio, run another and see what moved

    Thanked by (1)Not_Oles
  • @havoc said:
    Should show up in SMART data i would think.

    In my case, i had to run a manual smart test with the command sudo smartctl -t long /dev/sda before the bad sectors were registered. Before that I was only getting slow copy/write operations with smar panel not reporting any issues... Probably had to do with the bad firmware on my drive...

    Thanked by (1)Not_Oles

    Never make the same mistake twice. There are so many new ones to make.
    It’s OK if you disagree with me. I can’t force you to be right.

  • Not_OlesNot_Oles Hosting ProviderContent Writer

    After some testing, it seems that, following a recent update on the Ubuntu node,

    • New Debian VPSes don't seem to want to boot, and
    • Existing Debian VPSes still work, but maybe have inconsistent file I/O.

    Nevertheless,

    • New Ubuntu VPSes seem to work just fine!
    • A new OpenBSD VPS also seem to work fine!

    Of course, post hoc doesn't mean propter hoc. So I'm not saying the recent update is a cause.

    I'm still pretty confused. But, it's all a lot of fun! And, I get to increase my appreciation for the effort required to maintain virtualization systems!

    It will be interesting to see what happens in the upcoming days! :)

    I hope everyone gets the servers they want!

  • FalzoFalzo Senpai
    edited 5:42AM

    There is nothing wrong with it. You are just hitting cache limits.
    See my post on OGF (I hate cross posting).

    Spinning rust is not capable of these numbers anyway, so all you see is the cache working and whenever several layers clash and everything is flushed the real bottleneck comes to light.

    TL;DR; you need to choose a better layout for cache, scheduler and flushing behaviour to keep a better balance

  • @Falzo said:
    See my post on OGF (I hate cross posting).

    I can help with that! :lol:

    Thanked by (1)Falzo

    Never make the same mistake twice. There are so many new ones to make.
    It’s OK if you disagree with me. I can’t force you to be right.

Sign In or Register to comment.