[2022] ★ VirMach ★ RYZEN ★ NVMe ★★ The Epic Sales Offer Thread ★★

skorous · July 2022

@VirMach said: I agree with you on this. We didn't really have another choice for these. It's painful and bad, for us as well. There are probably 5% of people that have been stuck for 48 hours now and I don't like that but we're doing all we physically can.

Just in case you're thinking this is about the migrations, the discussion you're referring back to is actually about Ryzen Location Change button and @yoursunny opinions on it.

Papa · July 2022

@VirMach said:
Unfortunately there's pretty much nothing we can do about that really. I really think SolusVM did some updates to that tool and broke it for older operating systems recently. They've been breaking a lot of things, like libvirtd incompatibility, the migration tool wasn't working for a while, the operating systems don't template properly and they haven't been syncing properly a lot of times. They've just been updating all their PHP versions and whatever else, racing forward without actually checking anything.

Ok, I understand that, but is there a way to fix network manually? What settings should i set to get vm vorking? I couldn't see any problems from the first approach - there is eth0 ( ens3) interface up, there is a route to gateway xxx.xxx.xxx.1
I can ping nearby nodes, but for gateway response is Network host unreachable.

VirMach · July 2022

@Mumbly said:
@VirMach that's misunderstanding now
I didn't comment or criticized Virmach migrations here, but discussed @yoursunny's suggestion (well, he made a few good suggestions but this one I feel like wasn't the best one) about 24 hours migration queve in the future

I understand, I'm just adding onto it that I agree with you and it's not user friendly and I'm also saying that unfortunately our current situation is similar and also not how we intended for it to be coded by the developer. I did read what @yoursunny suggested and that definitely has its benefits but our ideal version would be immediate.

I remember we improved our script to minimize downtime to only something like an hour per VM and had that concept working pretty well, but it didn't work out when you're planning for efficiency and many servers.

How the project was for the developer that never completed it:

Customer is eligible to use it when the server is queued for migration, and later on it would be open to everyone (the latter being the Ryzen to Ryzen idea.)
Customer receives one credit per term length, as in if they could cancel and re-order, then they're eligible for it. Therefore, naturally customer on first month of service wouldn't be able to abuse it.
Migration is queued in a batch, but the batch gets processed more immediately. It essentially would only wait for the right load conditions and be throttled by quantities of requests. If there are instantly 1,000 requests, then it could theoretically naturally take 24 hours.
Since AFAIK SolusVM does not have an API to run the script that completes a migration, AKA, marks it in the database as being on a new node, our idea was that it'd power up a new service, then replace the details of the existing service on WHMCS.
Old service gets marked for deletion but not immediate deletion in case anything goes wrong, so the data is still there for a few days. More aggressive pruning may happen during peak usage, it'd essentially have let's say a 200GB pool where it stays for a week, and anything past that maybe only a day. This also allows for an easy revert button to be worked in later in customer regrets his decision, instead of contacting us and instead of moving it twice.

It'd have been pretty nice if we had that ready in time for these migrations so it instead didn't end up as the vague day-long periods.

flips · July 2022

Trying to figure out from network status and backlog here: Should FFME002 be operational?
Have never gotten it to work here ...

Troubleshooter reports:
Main IP pings: false
Node Online: false
Service online: offline
Operating System: linux-ubuntu-16.04-server-x86_64-minimal-latest
Service Status:Active

Daevien · July 2022

@flips said:
Trying to figure out from network status and backlog here: Should FFME002 be operational?
Have never gotten it to work here ...

pokes FFME002 with a stick nope, still dead. has been for a while, worked for like most of a day after i migrated there then dead

risturiz · July 2022

I see new Los Angeles Ryzen network better... 10k in traffic vs 100k in Atlanta

Papa · July 2022

Something strange about FFME003 network config of my vm. If i set network interface to dhcp, i receive correct ip address (xxx.xxx.163.xxx) but from dhcp server of another subnet xxx.xxx.162.2. And ip address i receive is from the same subnet as FFME004 vm. Is this working as intended? If i restore network config from control panel, i get the same ip as static, excluding missing symlink to resolvconf.

yoursunny · July 2022

@VirMach said:

Customer receives one credit per term length, as in if they could cancel and re-order, then they're eligible for it. Therefore, naturally customer on first month of service wouldn't be able to abuse it.

Most services are paid annually.
One migration per year is not enough.
That's why I suggested once per month.

Some starter credits should be granted on the current services, because:

Some services have been auto-migrated to undesirable locations, such as Amsterdam to Frankfurt.
Looking glass nodes are inoperable, so that for all the migration done so far, the chosen location may be unsuitable for the needs.

Once all the looking glass nodes are up, can we at least have 2~3 credits per year?
That's a lot more flexible than only one credit per year, in case the network condition deteriorates in the middle.

realEthanZou · July 2022

Seems many JP VMs got their IP changed without prior notice

rhinoduck · July 2022

@realEthanZou said:
Seems many JP VMs got their IP changed without prior notice

Indeed. No email, and no information I could find on the Network Status page.

While I understand that a host can face many challenges that cannot be predicted or immediately explained, this is not one of such situations. And the lack of a warning and the lack of information about why the change happened (Was it intentional and is it to stay, or was was it just a configuration mistake that will be reverted?) is a big fat NO NO in my book.

Papa · July 2022

What tha F is going in with FFME? Migrated from FFME004 yesterday for 3 bucks fee, got the same FFME004, but online and working, today it's totally offline - no boot, no VNC, nothing. And no information about what's going on. I was waiting patiently for two weeks, created zero tickets, but now i have to give up and leave as soon as i could get my data from FFME003. Honestly, even my home server with my hobbyist approach does have less downtime and more reliability, and migrates faster.

NerdUno · July 2022

While our Control Panel still shows 0 IPv4 addresses, we can finally get back in at the assigned IPv4 address and also through our private VPN. So progress has been made at least on the RYZE.SEA-Z002.VMS node in Seattle. We'll see if it stays up.

VirMach · July 2022

We're ramping up the abuse script temporarily. If you're at around full CPU usage on your VPS for more than 2 hours, it'll be powered down. We have too many VMs getting stuck on OS boot and negatively affecting others at this time, due to the operating systems getting stuck on boot for some after the Ryzen change. I apologize in advance for any false positives but I do want to note that it's technically within our terms for 2 hours of that type of usage to be considered abuse, we just usually try to be much more lenient.

@realEthanZou said:
Seems many JP VMs got their IP changed without prior notice

@rhinoduck said:

@realEthanZou said:
Seems many JP VMs got their IP changed without prior notice

Indeed. No email, and no information I could find on the Network Status page.

While I understand that a host can face many challenges that cannot be predicted or immediately explained, this is not one of such situations. And the lack of a warning and the lack of information about why the change happened (Was it intentional and is it to stay, or was was it just a configuration mistake that will be reverted?) is a big fat NO NO in my book.

We did send out emails, but it's possible they did not all send due to SolusVM also being overloaded around that time. Please also check your spam box, we sent these directly from SolusVM.

VirMach · July 2022

@yoursunny said:

@VirMach said:

Customer receives one credit per term length, as in if they could cancel and re-order, then they're eligible for it. Therefore, naturally customer on first month of service wouldn't be able to abuse it.

Most services are paid annually.
One migration per year is not enough.
That's why I suggested once per month.

Some starter credits should be granted on the current services, because:

Some services have been auto-migrated to undesirable locations, such as Amsterdam to Frankfurt.

Looking glass nodes are inoperable, so that for all the migration done so far, the chosen location may be unsuitable for the needs.

Once all the looking glass nodes are up, can we at least have 2~3 credits per year?
That's a lot more flexible than only one credit per year, in case the network condition deteriorates in the middle.

Well initially the way it's going to work is there will be a period of time where you're allowed to "Ryzen Migrate" to your desired location (without data) as everyone lands in their desired location. It'll be announced here and on OGF, as well as most likely an "Announcement" on our website and probably a 1-2 week period where this can be done by everyone eligible.

The credits I described are for a system not yet coded.

VirMach · July 2022

@rhinoduck said: Indeed. No email, and no information I could find on the Network Status page.

I'll make a network status page for it since it seems a lot of the emails failed to send.

VirMach · July 2022

@Papa said:
What tha F is going in with FFME? Migrated from FFME004 yesterday for 3 bucks fee, got the same FFME004, but online and working, today it's totally offline - no boot, no VNC, nothing. And no information about what's going on. I was waiting patiently for two weeks, created zero tickets, but now i have to give up and leave as soon as i could get my data from FFME003. Honestly, even my home server with my hobbyist approach does have less downtime and more reliability, and migrates faster.

FFME004, we found ECC error, and the settings also dropped off again. Memory swap fixed FFME004, we couldn't send out migration emails in time because xTom worked very quickly to get this replaced. The setting drop-off caused a disk to drop and node was online but VMs were not booting. That has been resolved. I'm checking it again to see if settings stick, if they don't there might be another reboot which we'll create network status for but hopefully this will be stable moving forward.

FFME has had 3 network status, many updates here, on OGF, and probably a fair share of emails. It can be considered an ongoing issue until they prove themselves by remaining online for more than 2 days.

VirMach · July 2022

Settings stuck on FFME004 but I'm pretty sure I've said that once before. There's zero information on this but there's been constant kernel bugs regarding these fixes on Linux. I can't rewrite the Linux kernel right now so until Linux figures out how it's going to treat these problems I don't know what else to do about it.

These issues have come on gone ever since NVMe SSDs existed, you can search online since around 2015. If you just look at Linux bug trackers it seems like every version fixes one thing and breaks another. I'll have to come up with some kind of kernel update plan moving forward where we try to mitigate the issues from re-appearing. But of course the solution can't just be to stay on the same version for 5 years.

We're using literal copies of the same nodes in FFM... in Tokyo, and they're not having the same problems with the only difference being kernel versions.

NerdUno · July 2022

@NerdUno said:
While our Control Panel still shows 0 IPv4 addresses, we can finally get back in at the assigned IPv4 address and also through our private VPN. So progress has been made at least on the RYZE.SEA-Z002.VMS node in Seattle. We'll see if it stays up.

Spoke too soon. Status back to dead in the water this afternoon.

kheng86 · July 2022

@VirMach said:
Settings stuck on FFME004 but I'm pretty sure I've said that once before. There's zero information on this but there's been constant kernel bugs regarding these fixes on Linux. I can't rewrite the Linux kernel right now so until Linux figures out how it's going to treat these problems I don't know what else to do about it.

These issues have come on gone ever since NVMe SSDs existed, you can search online since around 2015. If you just look at Linux bug trackers it seems like every version fixes one thing and breaks another. I'll have to come up with some kind of kernel update plan moving forward where we try to mitigate the issues from re-appearing. But of course the solution can't just be to stay on the same version for 5 years.

We're using literal copies of the same nodes in FFM... in Tokyo, and they're not having the same problems with the only difference being kernel versions.

My VM in FFME004 seems fine now, but I have a couple of VMs FFME005 and FFME006 having Status "Offline", can't bootup, can't reinstall OS. Hope you can help, thanks! Ticket #754039

yoursunny · July 2022

@VirMach said:
We're ramping up the abuse script temporarily. If you're at around full CPU usage on your VPS for more than 2 hours, it'll be powered down. We have too many VMs getting stuck on OS boot and negatively affecting others at this time, due to the operating systems getting stuck on boot for some after the Ryzen change. I apologize in advance for any false positives but I do want to note that it's technically within our terms for 2 hours of that type of usage to be considered abuse, we just usually try to be much more lenient.

Boot loop after migrating to a different CPU or changing to a different IP is not abuse.
Customer purchased service on a specific CPU and a specific IP that are not expected to change.
The kernel and userland could have been compiled with -march=native so that it would not start on any other CPU.
The services could have been configured to bind to a specific IP, which would cause service restart loop if the IP disappeared.

The safest way is not automatically powering on the service after the migration.
The customer needs to press Power On button themselves and then fixes the machine right away.

Running -march=native code on an unsupported CPU triggers undefined behavior.
Undefined behavior means anything could happen, such as pink unicorn appearing in VirMach offices, @deank stopping to believe in the end, or @FrankZ receiving 1000 free servers.
The simple act of automatic powering on a migrated server could cause these severe consequences and you don't want that.

VirMach · July 2022

@NerdUno said:

@NerdUno said:
While our Control Panel still shows 0 IPv4 addresses, we can finally get back in at the assigned IPv4 address and also through our private VPN. So progress has been made at least on the RYZE.SEA-Z002.VMS node in Seattle. We'll see if it stays up.

Spoke too soon. Status back to dead in the water this afternoon.

This has an issue with a software getting stuck and duplicating its process over and over until it overloads and we have to reboot it. We made some changes, if it happens again we'll try to catch it earlier this time to avoid a reboot.

VirMach · July 2022

@kheng86 said: but I have a couple of VMs FFME005 and FFME006 having Status "Offline", can't bootup, can't reinstall OS. Hope you can help, thanks!

Were these always offline after Ryzen Migrate button?

VirMach · July 2022

@yoursunny said:

@VirMach said:
We're ramping up the abuse script temporarily. If you're at around full CPU usage on your VPS for more than 2 hours, it'll be powered down. We have too many VMs getting stuck on OS boot and negatively affecting others at this time, due to the operating systems getting stuck on boot for some after the Ryzen change. I apologize in advance for any false positives but I do want to note that it's technically within our terms for 2 hours of that type of usage to be considered abuse, we just usually try to be much more lenient.

Boot loop after migrating to a different CPU or changing to a different IP is not abuse.
Customer purchased service on a specific CPU and a specific IP that are not expected to change.
The kernel and userland could have been compiled with -march=native so that it would not start on any other CPU.
The services could have been configured to bind to a specific IP, which would cause service restart loop if the IP disappeared.

The safest way is not automatically powering on the service after the migration.
The customer needs to press Power On button themselves and then fixes the machine right away.

Running -march=native code on an unsupported CPU triggers undefined behavior.
Undefined behavior means anything could happen, such as pink unicorn appearing in VirMach offices, @deank stopping to believe in the end, or @FrankZ receiving 1000 free servers.
The simple act of automatic powering on a migrated server could cause these severe consequences and you don't want that.

We're ramping up the abuse script. It's what it is called. I didn't say boot loop after migrating is abuse.

Abuse script will just power it down, not suspend. I don't see the harm in powering down something stuck in a boot loop. I was just providing this as a PSA for anyone reading who might be doing something else not related that's also using a lot of CPU and for general transparency, we're making the abuse script more strict to try to power down the ones stuck in the boot loop automatically more quickly.

The safest way is not automatically powering on the service after the migration.
The customer needs to press Power On button themselves and then fixes the machine right away.

Not possible, we have to power up all of them to fix other issues. Otherwise we won't know the difference between one that's stuck and won't boot and others. Plus many customers immediately make tickets instead of trying to power up the VPS after it goes offline so in any case having them powered on has more benefits than keeping them offline.

kheng86 · July 2022

@VirMach said:

@kheng86 said: but I have a couple of VMs FFME005 and FFME006 having Status "Offline", can't bootup, can't reinstall OS. Hope you can help, thanks!

Were these always offline after Ryzen Migrate button?

The "Migration" button was not used. Has been always offline after the migration. It happened after the planned migration from AMS to FFE

vyas · July 2022

My VPS from NL ==> FFxxx ==> FF?? seems to be up and running after experiencing disk errors, and a long-ish downtime.

and a brand new YABS coming up just to make @cybertech happy.

Side benefits of being a @Virmach customer : one can develop a high threshold for patience (or patscience provider from Romania who-shall-not-be-named used to say).
And that's not a criticism!

Under US $ 10 a year is way cheaper than paying for meditation app/ Yog classes

Cheers

AlwaysSkint · July 2022

Fantastic!

Your IP 92.18.90.xxx has been banned
Ban Reason: Banned for 34 login attempts.
Ban Expires: 07/08/2022 (21:00)

My multi-IP NY is down, likely moved and I wanted to find out what the score was.

[EDIT1:]
One VPN later (ironically in NYC) and yep my triple IP VPS is fubar. Pointless changing main IP without the others being available. (NYCB018)
Awaiting non-patiently @VirMach

[EDIT2:]
At least the CHI is still there, for now.

flips · July 2022

IIRC my FFME0002 node was online for a while after hitting the migrate button. But then it's been dead ever since ...

Daevien · July 2022

@flips said:
IIRC my FFME0002 node was online for a while after hitting the migrate button. But then it's been dead ever since ...

Yep, same. I poke mine with a stick occasionally but it hasn't really done anything of note since the first day.

fan · July 2022

@VirMach said: Before I change LAX to also have the fine-tuned NIC driver configuration does anyone who has both LAX and somewhere else functional notice one performing better than the other? On my end, LAX is around 70% cleaner.

Finally got my VPS on LAXA031 connected to the Internet, by clicking either fix networking or reinstallation buttons. Checking with "sar -n DEV 2 5", the result is far better (lower) on LAX (<100) than TYO(~1000 or more, even on the relatively stable node I previously mentioned). Nodes that behaving poorly like TYOC029 and TYOC002S could benefit a lot if the fix is applied on them, especially once you added the 10gbps switch to the storage node.

fan · July 2022

Yabs on LA node LAXA031, looks good:

Sat 09 Jul 2022 02:01:37 PM CST

Basic System Information:
---------------------------------
Uptime     : 0 days, 0 hours, 0 minutes
Processor  : AMD Ryzen 9 5950X 16-Core Processor
CPU cores  : 2 @ 3393.624 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ✔ Enabled
RAM        : 1.8 GiB
Swap       : 975.0 MiB
Disk       : 43.0 GiB
Distro     : Debian GNU/Linux 11 (bullseye)
Kernel     : 5.10.0-13-amd64

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 285.18 MB/s  (71.2k) | 1.03 GB/s    (16.1k)
Write      | 285.93 MB/s  (71.4k) | 1.03 GB/s    (16.2k)
Total      | 571.11 MB/s (142.7k) | 2.06 GB/s    (32.3k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 1.36 GB/s     (2.6k) | 2.05 GB/s     (2.0k)
Write      | 1.43 GB/s     (2.8k) | 2.18 GB/s     (2.1k)
Total      | 2.80 GB/s     (5.4k) | 4.23 GB/s     (4.1k)

iperf3 Network Speed Tests (IPv4):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed
                |                           |                 |
Clouvider       | London, UK (10G)          | 687 Mbits/sec   | 292 Mbits/sec
Online.net      | Paris, FR (10G)           | 753 Mbits/sec   | 381 Mbits/sec
Hybula          | The Netherlands (40G)     | 741 Mbits/sec   | 531 Mbits/sec
Uztelecom       | Tashkent, UZ (10G)        | 656 Mbits/sec   | 207 Mbits/sec
Clouvider       | NYC, NY, US (10G)         | 879 Mbits/sec   | 523 Mbits/sec
Clouvider       | Dallas, TX, US (10G)      | 734 Mbits/sec   | 247 Mbits/sec
Clouvider       | Los Angeles, CA, US (10G) | 928 Mbits/sec   | 910 Mbits/sec

Geekbench 5 Benchmark Test:
---------------------------------
Test            | Value
                |
Single Core     | 941
Multi Core      | 1691
Full Test       | https://browser.geekbench.com/v5/cpu/15906668

[2022] ★ VirMach ★ RYZEN ★ NVMe ★★ The Epic Sales Offer Thread ★★

Comments