Help me troubleshoot this slow VPS

jqrjqr OG
edited June 13 in Help

Help me LES, you're my only hope.

I have a VPS with a popular, reputable provider (don't want to drag them into this, as I believe this is a problem with the way I configured it).

The Pertinent Details

1 vCPU, 1GB RAM, 10GB SSD, 1TB HD

Server is in Montreal, Canada, I'm in California.

Virtualization is Xen (this is the main difference between it and the rest of my boxes). OS is NixOS.

The SSD is partitioned into 512MB for /boot and the rest is LUKS encrypted, with LVM on top .

Ping is decent:

Minimum = 75ms, Maximum = 80ms, Average = 77ms

The box is basically idle:

11:31:23 up 98 days 12:49, 8 users, load average: 0.08, 0.05, 0.02

~500MB of free memory, ~600GB free space in the SSD.

The Issue

Every command takes forever to run the first time. SSHing in, waiting for the login prompt, ps xa. Everything.

I don't think it's a latency issue. time sleep 5s outputs real 0m6.982s. If I run it a second time, then it takes 0m5.006s as one would expect.

What I've Checked

I've already stopped every process to see if I had some kind of OOM/high CPU usage situation.

The appropriate xen_netfront and xen_blkfront modules are loaded.

I thought the SSD might be going bad, but there's no error messages in dmesg or journalctl.

Here's a redacted YABS:

# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
#              Yet-Another-Bench-Script              #
#                     v2024-06-09                    #
# https://github.com/masonr/yet-another-bench-script #
# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #

Thu Jun 13 12:54:48 PM PDT 2024

Basic System Information:
---------------------------------
Uptime     : 98 days, 14 hours, 13 minutes
Processor  : AMD EPYC 7302 16-Core Processor
CPU cores  : 1 @ 3000.099 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ❌ Disabled
RAM        : 938.5 MiB
Swap       : 2.0 GiB
Disk       : 1014.6 GiB
Distro     : NixOS 23.11 (Tapir)
Kernel     : 6.1.63
VM Type    : XEN
IPv4/IPv6  : ✔ Online / ❌ Offline

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/disk/by-uuid/a4b06307-53b9-463e-882e-61434d992934):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 33.92 MB/s    (8.4k) | 164.60 MB/s   (2.5k)
Write      | 34.01 MB/s    (8.5k) | 165.47 MB/s   (2.5k)
Total      | 67.93 MB/s   (16.9k) | 330.07 MB/s   (5.1k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 178.40 MB/s    (348) | 177.39 MB/s    (173)
Write      | 187.88 MB/s    (366) | 189.20 MB/s    (184)
Total      | 366.28 MB/s    (714) | 366.60 MB/s    (357)

iperf3 Network Speed Tests (IPv4):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed      | Ping
-----           | -----                     | ----            | ----            | ----
Clouvider       | London, UK (10G)          | 1.49 Gbits/sec  | 554 Mbits/sec   | 82.7 ms
Eranium         | Amsterdam, NL (100G)      | 2.26 Gbits/sec  | 1.03 Gbits/sec  | 80.6 ms
Uztelecom       | Tashkent, UZ (10G)        | busy            | busy            | 170 ms
Leaseweb        | Singapore, SG (10G)       | 390 Mbits/sec   | 364 Mbits/sec   | 238 ms
Clouvider       | Los Angeles, CA, US (10G) | busy            | busy            | 72.2 ms
Leaseweb        | NYC, NY, US (10G)         | 3.14 Gbits/sec  | 1.36 Gbits/sec  | 9.56 ms
Edgoo           | Sao Paulo, BR (1G)        | 1.46 Gbits/sec  | 318 Mbits/sec   | 133 ms

Geekbench test failed and low memory was detected. Add at least 1GB of SWAP or use GB4 instead (higher compatibility with low memory systems).

YABS completed in 15 min 11 sec

Any clues? What else can I check?

Thanks in advance.

It's pronounced hacker.

Comments

  • JabJab Senpai

    This sounds like a disk issue.
    What does ioping shows?

    Thanked by (1)jqr

    Haven't bought a single service in VirMach Great Ryzen 2022 - 2023 Flash Sale.
    https://lowendspirit.com/uploads/editor/gi/ippw0lcmqowk.png

  • ... ehh... Network routing? Misconfigured hostname/DNS?

    Shots in the dark; I (still) have no hands on NixOS experience, but once had a Linux box with time-out like behaviour on first try for numerous commands, and swift responses after that. I had messed up something with networking, after correcting it, no problem. Found the cause by accident, and to long ago to remember specific symptoms.

    Have fun troubleshooting ;-)

    Thanked by (3)jqr tmntwitw ehab
  • Thanks for your reply @Jab

    I'm not familiar with ioping, but here's the output from ioping -R /dev/xvda:

    --- /dev/xvda (block device 10 GiB) ioping statistics ---
    9 requests completed in 2.36 s, 36 KiB read, 3 iops, 15.3 KiB/s
    generated 10 requests in 3.02 s, 40 KiB, 3 iops, 13.2 KiB/s
    min/avg/max/mdev = 70.6 ms / 261.9 ms / 580.0 ms / 166.6 ms
    

    and ioping -RL /dev/xvda

    --- /dev/xvda (block device 10 GiB) ioping statistics ---
    48 requests completed in 3.44 s, 12 MiB read, 13 iops, 3.48 MiB/s
    generated 49 requests in 3.55 s, 12.2 MiB, 13 iops, 3.45 MiB/s
    min/avg/max/mdev = 731.3 us / 71.7 ms / 601.1 ms / 137.5 ms
    

    Let me know if you want me to run a different test.

    I also ran cryptsetup benchmark thinking maybe the LUKS encryption might be too much for the box.

    # Tests are approximate using memory only (no storage IO).
    PBKDF2-sha1      1068884 iterations per second for 256-bit key
    PBKDF2-sha256    2317295 iterations per second for 256-bit key
    PBKDF2-sha512    1144733 iterations per second for 256-bit key
    PBKDF2-ripemd160  516031 iterations per second for 256-bit key
    PBKDF2-whirlpool  477059 iterations per second for 256-bit key
    argon2i       4 iterations, 485032 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
    argon2id      4 iterations, 496607 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
    #     Algorithm |       Key |      Encryption |      Decryption
            aes-cbc        128b         0.4 MiB/s      2374.6 MiB/s
        serpent-cbc        128b         1.9 MiB/s       435.4 MiB/s
        twofish-cbc        128b       109.1 MiB/s       250.0 MiB/s
            aes-cbc        256b       501.7 MiB/s      1902.5 MiB/s
        serpent-cbc        256b        71.6 MiB/s       454.0 MiB/s
        twofish-cbc        256b       166.2 MiB/s       267.3 MiB/s
            aes-xts        256b      2288.5 MiB/s      2171.1 MiB/s
        serpent-xts        256b       374.7 MiB/s       387.5 MiB/s
        twofish-xts        256b       229.7 MiB/s       246.3 MiB/s
            aes-xts        512b      1831.4 MiB/s      1860.0 MiB/s
        serpent-xts        512b       406.3 MiB/s       403.3 MiB/s
        twofish-xts        512b       257.3 MiB/s       265.8 MiB/s
    

    It's pronounced hacker.

  • @wankel said:
    ... ehh... Network routing? Misconfigured hostname/DNS?
    [snip]
    Have fun troubleshooting ;-)

    Interesting. I'll see what I can find on that end. Thanks!

    It's pronounced hacker.

  • Maybe it is a networking/routing issue. This is my tracert to the box:

      1    <1 ms    <1 ms    <1 ms  192.168.128.1
      2     1 ms    <1 ms    <1 ms  192.168.1.254
      3     2 ms     5 ms     1 ms  108-88-60-1.lightspeed.sntcca.sbcglobal.net [108.88.60.1]
      4     5 ms     5 ms     5 ms  71.148.135.190
      5     *        *        *     Request timed out.
      6     *        *        *     Request timed out.
      7     *        7 ms     5 ms  192.205.32.182
      8   157 ms   158 ms   155 ms  kanc-bb2-link.ip.twelve99.net [62.115.121.177]
      9   145 ms   144 ms   145 ms  chi-bb2-link.ip.twelve99.net [62.115.136.102]
     10     *       71 ms    71 ms  nyk-bb2-link.ip.twelve99.net [62.115.132.134]
     11   147 ms   144 ms   144 ms  ldn-bb1-link.ip.twelve99.net [62.115.113.21]
     12   143 ms     *      146 ms  ldn-b2-link.ip.twelve99.net [62.115.122.189]
     13   143 ms   165 ms   145 ms  62.115.38.211
     14     *        *        *     Request timed out.
     15     *        *        *     Request timed out.
     16     *        *        *     Request timed out.
     17   187 ms   188 ms   188 ms  XXX.XXX.XXX.XXX
    

    Compared to another one of my boxes:

      1    <1 ms    <1 ms    <1 ms  192.168.128.1
      2     1 ms    <1 ms     7 ms  192.168.1.254
      3     3 ms     2 ms     2 ms  108-88-60-1.lightspeed.sntcca.sbcglobal.net [108.88.60.1]
      4     5 ms     1 ms     3 ms  71.148.135.190
      5     *        *        *     Request timed out.
      6     *        *        *     Request timed out.
      7     *        *        *     Request timed out.
      8     8 ms     8 ms    10 ms  be3142.ccr21.sjc01.atlas.cogentco.com [154.54.1.193]
      9    18 ms    17 ms    17 ms  be3176.ccr41.lax01.atlas.cogentco.com [154.54.31.189]
     10    17 ms    21 ms    19 ms  be3243.ccr41.lax05.atlas.cogentco.com [154.54.27.118]
     11    17 ms    17 ms    18 ms  be3584.rcr51.b004747-3.lax05.atlas.cogentco.com [154.54.85.230]
     12    16 ms    22 ms    16 ms  38.19.140.186
     13    18 ms    18 ms    17 ms  YYY.YYY.YYY.YYY
    

    It's pronounced hacker.

  • Not_OlesNot_Oles Hosting ProviderContent Writer

    There isn't much free memory, so nothing or not much gets cached?

    The partition where the command binaries and linked libraries live is encrypted? So, every time you run a new command the first time, the binary and maybe some libraries have to get fetched and decrypted before they can be used?

    What might happen if you added a few more GB of swap?

    If you don't mind reinstalling, maybe test whether the VPS works faster with plain Debian and plain, unencrypted ext4?

    Reinstalling with NixOS and encryption, might it make sense to keep the system binaries and libraries on a separate, unencrypted partition?

    Good luck! Best wishes! :)

    Thanked by (2)tmntwitw jqr

    I hope everyone gets the servers they want!

  • edited June 13

    Maybe can try running strace/ltrace on the command (first time, second time), and see if there are any significant differences in the outputs.

    Thanked by (1)Not_Oles
  • @Not_Oles said:
    [snip]
    If you don't mind reinstalling, maybe test whether the VPS works faster with plain Debian and plain, unencrypted ext4?

    Thank you for your reply! I'll look into your other suggestions, but I tried something similar to this: I booted SystemRescueCD and the system is very snappy.

    Here's the hard disk section of YABS while in rescue mode:

    fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/mapper/system-root):
    ---------------------------------
    Block Size | 4k            (IOPS) | 64k           (IOPS)
      ------   | ---            ----  | ----           ----
    Read       | 40.23 MB/s   (10.0k) | 159.67 MB/s   (2.4k)
    Write      | 40.32 MB/s   (10.0k) | 160.52 MB/s   (2.5k)
    Total      | 80.55 MB/s   (20.1k) | 320.19 MB/s   (5.0k)
               |                      |
    Block Size | 512k          (IOPS) | 1m            (IOPS)
      ------   | ---            ----  | ----           ----
    Read       | 172.80 MB/s    (337) | 167.41 MB/s    (163)
    Write      | 181.98 MB/s    (355) | 178.56 MB/s    (174)
    Total      | 354.78 MB/s    (692) | 345.97 MB/s    (337)
    

    ioping -R /dev/xvda

    --- /dev/xvda (block device 10 GiB) ioping statistics ---
    28 requests completed in 3.09 s, 112 KiB read, 9 iops, 36.3 KiB/s
    generated 29 requests in 3.19 s, 116 KiB, 9 iops, 36.4 KiB/s
    min/avg/max/mdev = 199.1 us / 110.3 ms / 352.1 ms / 97.8 ms
    

    ioping -RL /dev/xvda

    --- /dev/xvda (block device 10 GiB) ioping statistics ---
    68 requests completed in 3.06 s, 17 MiB read, 22 iops, 5.56 MiB/s
    generated 69 requests in 3.06 s, 17.2 MiB, 22 iops, 5.63 MiB/s
    min/avg/max/mdev = 645.1 us / 45.0 ms / 255.3 ms / 68.0 ms
    

    Looks like double the performance? I'll keep researching.

    Thanked by (1)Not_Oles

    It's pronounced hacker.

  • Not_OlesNot_Oles Hosting ProviderContent Writer

    @jqr Guessing that the disk tests you did with the System Rescue CD might omit the encryption/decryption steps which occur within the system as installed? And skipping the encryption/decryption steps resulted in a significant speed increase?

    Can you just turn off and on the encryption in the installed system? If yes, maybe turning encryption off might mean you can't read previously written encrypted files, but maybe you could run comparative disk I/O tests within the installed system without the burden of encryption?

    Thanked by (1)jqr

    I hope everyone gets the servers they want!

  • @jqr said: don't want to drag them into this

    Xen in Montreal means we know who it is anyway :-D

    Agree with @jab, it sounds like a disk issue to me, and it may be something on the node (caused by a neighbour, fault, etc.) which is not exposed to you. Starting from a rescue CD and it working OK is in line with this.

    Ideally turn off all swap for tests to avoid confusing the issue. Could try running "slow" commands from ramdisk jail. The motivation of this is to see if just accessing the binaries/libraries on disk is what's slowing it down rather than anything computational - if they run fine without disk access then points to disk issue. Might need to adjust for particular linux flavour, but something like,

    mkdir /mnt/tmpfs
    mount -t tmpfs -o size=256M ramdisk /mnt/tmpfs
    mkdir -p /mnt/tmpfs/{bin,lib,lib64,proc}
    mount --bind /proc /mnt/tmpfs/proc
    cd /mnt/tmpfs
    for f in bash ps; do cp /bin/${f} bin/${f}; for d in $(ldd /bin/${f} | sed -e "s/.*=>//" | awk '{print $1}' | sort | uniq); do cp --parents "${d}" .; done; done
    chroot /mnt/tmpfs bash
    

    Put whatever you want to run in the for loop (ps is the example here). Now ps xa or whatever. Cleanup umount /mnt/tmpfs/proc ; umount /mnt/tmpfs. Run OK? Maybe due to being in RAM? If OK, check what else is hitting disk, e.g. iotop. If looks pretty clean, then time to open a ticket and ask about disk on the node.

    Thanked by (1)jqr
  • jqrjqr OG
    edited June 14

    @Not_Oles said:
    Guessing that the disk tests you did with the System Rescue CD might omit the encryption/decryption steps which occur within the system as installed? And skipping the encryption/decryption steps resulted in a significant speed increase?

    It does not. I opened the encrypted volume in order to test it.

    Can you just turn off and on the encryption in the installed system? If yes, maybe turning encryption off might mean you can't read previously written encrypted files, but maybe you could run comparative disk I/O tests within the installed system without the burden of encryption?

    Afraid not. 😔 LUKS is all or nothing. I could try to re-install NixOS without encryption as you suggested, and if I am unable to figure out the problem, I might give that a try.

    @tetech said:
    Xen in Montreal means we know who it is anyway :-D

    Ooops. 😅 I guess so. They're great. I just don't think that this issue is their fault and didn't want to tarnish their reputation.

    Agree with @jab, it sounds like a disk issue to me, and it may be something on the node (caused by a neighbour, fault, etc.) which is not exposed to you. Starting from a rescue CD and it working OK is in line with this.

    Oh! Of course! SystemRescueCD is running from RAM, explaining why it would be snappy then.

    Could try running "slow" commands from ramdisk jail. The motivation of this is to see if just accessing the binaries/libraries on disk is what's slowing it down rather than anything computational - if they run fine without disk access then points to disk issue.

    Ah, this is another excellent idea. NixOS is... idiosyncratic, and therefore your awesome chroot snippet refuses to run, but let me see if there's another way to test it via that method. Thanks!

    edit: Forgot to mention: nothing weird shows up in iotop either.

    It's pronounced hacker.

  • One thought I have is that the host node is overcommitted. Unsure if NixOS can show you steal time.

    Try reboot as well.

    Thanked by (1)jqr
  • Well, I bit the bullet and reinstalled. Their Debian 12 image is blazing fast. I then tried installing NixOS, this time without LUKS encryption, and it went back to being slow. 😔

    I'm guessing there's some Xen optimization that is missing from NixOS. Maybe I'll ask the provider if they do anything special to their Debian image.

    Thanks again to all who offered advice.

    Thanked by (2)Not_Oles wankel

    It's pronounced hacker.

  • Not_OlesNot_Oles Hosting ProviderContent Writer

    @jqr said: Thanks again to all who offered advice.

    Good luck! Have fun!

    Thanked by (1)jqr

    I hope everyone gets the servers they want!

  • edited June 14

    @jqr said: I'm guessing there's some Xen optimization that is missing from NixOS.

    Might be a kernel issue? Try the kernel from the Debian 12 Image and put it in NixOS maybe

    @jqr said: Maybe I'll ask the provider if they do anything special to their Debian image.

    Just check if installing a normal fresh downloaded netinst ISO provides the same results

    Thanked by (1)jqr

    youtube.com/watch?v=k1BneeJTDcU

  • jqrjqr OG
    edited June 25

    I think I figured it out!

    Debian was running a process called xe-guest-utilites, which "brings better performance and is required for various features". Talk about an ambiguous description.

    Either way, with that installed (in NixOS, it's done by adding services.xe-guest-utilities.enable = true; to /etc/nixos/configuration.nix), the slowdowns are gone! Or at least greatly diminished? I was never able to figure out a metric for the weird latency I was getting, but it feels waaay better. With LUKS encryption enabled too.

    Thanks again to all who helped! I hope this post helps someone else in the future.

    Thanked by (3)Wonder_Woman Not_Oles ehab

    It's pronounced hacker.

  • Not_OlesNot_Oles Hosting ProviderContent Writer

    Congrats! :)

    Thanked by (1)jqr

    I hope everyone gets the servers they want!

Sign In or Register to comment.