Help me troubleshoot this slow VPS
Help me LES, you're my only hope.
I have a VPS with a popular, reputable provider (don't want to drag them into this, as I believe this is a problem with the way I configured it).
The Pertinent Details
1 vCPU, 1GB RAM, 10GB SSD, 1TB HD
Server is in Montreal, Canada, I'm in California.
Virtualization is Xen (this is the main difference between it and the rest of my boxes). OS is NixOS.
The SSD is partitioned into 512MB for /boot and the rest is LUKS encrypted, with LVM on top .
Ping is decent:
Minimum = 75ms, Maximum = 80ms, Average = 77ms
The box is basically idle:
11:31:23  up 98 days 12:49,  8 users,  load average: 0.08, 0.05, 0.02
~500MB of free memory, ~600GB free space in the SSD.
The Issue
Every command takes forever to run the first time. SSHing in, waiting for the login prompt, ps xa. Everything.
I don't think it's a latency issue. time sleep 5s outputs real 0m6.982s. If I run it a second time, then it takes 0m5.006s as one would expect.
What I've Checked
I've already stopped every process to see if I had some kind of OOM/high CPU usage situation.
The appropriate xen_netfront and xen_blkfront modules are loaded.
I thought the SSD might be going bad, but there's no error messages in dmesg or journalctl.
Here's a redacted YABS:
# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
#              Yet-Another-Bench-Script              #
#                     v2024-06-09                    #
# https://github.com/masonr/yet-another-bench-script #
# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
Thu Jun 13 12:54:48 PM PDT 2024
Basic System Information:
---------------------------------
Uptime     : 98 days, 14 hours, 13 minutes
Processor  : AMD EPYC 7302 16-Core Processor
CPU cores  : 1 @ 3000.099 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ❌ Disabled
RAM        : 938.5 MiB
Swap       : 2.0 GiB
Disk       : 1014.6 GiB
Distro     : NixOS 23.11 (Tapir)
Kernel     : 6.1.63
VM Type    : XEN
IPv4/IPv6  : ✔ Online / ❌ Offline
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/disk/by-uuid/a4b06307-53b9-463e-882e-61434d992934):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 33.92 MB/s    (8.4k) | 164.60 MB/s   (2.5k)
Write      | 34.01 MB/s    (8.5k) | 165.47 MB/s   (2.5k)
Total      | 67.93 MB/s   (16.9k) | 330.07 MB/s   (5.1k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 178.40 MB/s    (348) | 177.39 MB/s    (173)
Write      | 187.88 MB/s    (366) | 189.20 MB/s    (184)
Total      | 366.28 MB/s    (714) | 366.60 MB/s    (357)
iperf3 Network Speed Tests (IPv4):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed      | Ping
-----           | -----                     | ----            | ----            | ----
Clouvider       | London, UK (10G)          | 1.49 Gbits/sec  | 554 Mbits/sec   | 82.7 ms
Eranium         | Amsterdam, NL (100G)      | 2.26 Gbits/sec  | 1.03 Gbits/sec  | 80.6 ms
Uztelecom       | Tashkent, UZ (10G)        | busy            | busy            | 170 ms
Leaseweb        | Singapore, SG (10G)       | 390 Mbits/sec   | 364 Mbits/sec   | 238 ms
Clouvider       | Los Angeles, CA, US (10G) | busy            | busy            | 72.2 ms
Leaseweb        | NYC, NY, US (10G)         | 3.14 Gbits/sec  | 1.36 Gbits/sec  | 9.56 ms
Edgoo           | Sao Paulo, BR (1G)        | 1.46 Gbits/sec  | 318 Mbits/sec   | 133 ms
Geekbench test failed and low memory was detected. Add at least 1GB of SWAP or use GB4 instead (higher compatibility with low memory systems).
YABS completed in 15 min 11 sec
Any clues? What else can I check?
Thanks in advance.
It's pronounced hacker.
 
                             
                            
Comments
This sounds like a disk issue.
What does
iopingshows?Haven't bought a single service in VirMach Great Ryzen 2022 - 2023 Flash Sale.
https://lowendspirit.com/uploads/editor/gi/ippw0lcmqowk.png
... ehh... Network routing? Misconfigured hostname/DNS?
Shots in the dark; I (still) have no hands on NixOS experience, but once had a Linux box with time-out like behaviour on first try for numerous commands, and swift responses after that. I had messed up something with networking, after correcting it, no problem. Found the cause by accident, and to long ago to remember specific symptoms.
Have fun troubleshooting ;-)
Thanks for your reply @Jab
I'm not familiar with ioping, but here's the output from
ioping -R /dev/xvda:and
ioping -RL /dev/xvdaLet me know if you want me to run a different test.
I also ran
cryptsetup benchmarkthinking maybe the LUKS encryption might be too much for the box.It's pronounced hacker.
Interesting. I'll see what I can find on that end. Thanks!
It's pronounced hacker.
Maybe it is a networking/routing issue. This is my
tracertto the box:Compared to another one of my boxes:
It's pronounced hacker.
There isn't much free memory, so nothing or not much gets cached?
The partition where the command binaries and linked libraries live is encrypted? So, every time you run a new command the first time, the binary and maybe some libraries have to get fetched and decrypted before they can be used?
What might happen if you added a few more GB of swap?
If you don't mind reinstalling, maybe test whether the VPS works faster with plain Debian and plain, unencrypted ext4?
Reinstalling with NixOS and encryption, might it make sense to keep the system binaries and libraries on a separate, unencrypted partition?
Good luck! Best wishes!
I hope everyone gets the servers they want!
Maybe can try running strace/ltrace on the command (first time, second time), and see if there are any significant differences in the outputs.
Thank you for your reply! I'll look into your other suggestions, but I tried something similar to this: I booted SystemRescueCD and the system is very snappy.
Here's the hard disk section of YABS while in rescue mode:
ioping -R /dev/xvdaioping -RL /dev/xvdaLooks like double the performance? I'll keep researching.
It's pronounced hacker.
@jqr Guessing that the disk tests you did with the System Rescue CD might omit the encryption/decryption steps which occur within the system as installed? And skipping the encryption/decryption steps resulted in a significant speed increase?
Can you just turn off and on the encryption in the installed system? If yes, maybe turning encryption off might mean you can't read previously written encrypted files, but maybe you could run comparative disk I/O tests within the installed system without the burden of encryption?
I hope everyone gets the servers they want!
Xen in Montreal means we know who it is anyway :-D
Agree with @jab, it sounds like a disk issue to me, and it may be something on the node (caused by a neighbour, fault, etc.) which is not exposed to you. Starting from a rescue CD and it working OK is in line with this.
Ideally turn off all swap for tests to avoid confusing the issue. Could try running "slow" commands from ramdisk jail. The motivation of this is to see if just accessing the binaries/libraries on disk is what's slowing it down rather than anything computational - if they run fine without disk access then points to disk issue. Might need to adjust for particular linux flavour, but something like,
Put whatever you want to run in the for loop (
psis the example here). Nowps xaor whatever. Cleanupumount /mnt/tmpfs/proc ; umount /mnt/tmpfs. Run OK? Maybe due to being in RAM? If OK, check what else is hitting disk, e.g.iotop. If looks pretty clean, then time to open a ticket and ask about disk on the node.It does not. I opened the encrypted volume in order to test it.
Afraid not. 😔 LUKS is all or nothing. I could try to re-install NixOS without encryption as you suggested, and if I am unable to figure out the problem, I might give that a try.
Ooops. 😅 I guess so. They're great. I just don't think that this issue is their fault and didn't want to tarnish their reputation.
Oh! Of course! SystemRescueCD is running from RAM, explaining why it would be snappy then.
Ah, this is another excellent idea. NixOS is... idiosyncratic, and therefore your awesome chroot snippet refuses to run, but let me see if there's another way to test it via that method. Thanks!
edit: Forgot to mention: nothing weird shows up in
iotopeither.It's pronounced hacker.
One thought I have is that the host node is overcommitted. Unsure if NixOS can show you steal time.
Try reboot as well.
Well, I bit the bullet and reinstalled. Their Debian 12 image is blazing fast. I then tried installing NixOS, this time without LUKS encryption, and it went back to being slow. 😔
I'm guessing there's some Xen optimization that is missing from NixOS. Maybe I'll ask the provider if they do anything special to their Debian image.
Thanks again to all who offered advice.
It's pronounced hacker.
Good luck! Have fun!
I hope everyone gets the servers they want!
Might be a kernel issue? Try the kernel from the Debian 12 Image and put it in NixOS maybe
Just check if installing a normal fresh downloaded netinst ISO provides the same results
youtube.com/watch?v=k1BneeJTDcU
I think I figured it out!
Debian was running a process called xe-guest-utilites, which "brings better performance and is required for various features". Talk about an ambiguous description.
Either way, with that installed (in NixOS, it's done by adding
services.xe-guest-utilities.enable = true;to/etc/nixos/configuration.nix), the slowdowns are gone! Or at least greatly diminished? I was never able to figure out a metric for the weird latency I was getting, but it feels waaay better. With LUKS encryption enabled too.Thanks again to all who helped! I hope this post helps someone else in the future.
It's pronounced hacker.
Congrats!
I hope everyone gets the servers they want!