[2022] ★ VirMach ★ RYZEN ★ NVMe ★★ The Epic Sales Offer Thread ★★

14344464849277

Comments

  • skorousskorous OGSenpai
    edited July 2022

    @VirMach said: I agree with you on this. We didn't really have another choice for these. It's painful and bad, for us as well. There are probably 5% of people that have been stuck for 48 hours now and I don't like that but we're doing all we physically can.

    Just in case you're thinking this is about the migrations, the discussion you're referring back to is actually about Ryzen Location Change button and @yoursunny opinions on it.

    Thanked by (1)Mumbly
  • @VirMach said:
    Unfortunately there's pretty much nothing we can do about that really. I really think SolusVM did some updates to that tool and broke it for older operating systems recently. They've been breaking a lot of things, like libvirtd incompatibility, the migration tool wasn't working for a while, the operating systems don't template properly and they haven't been syncing properly a lot of times. They've just been updating all their PHP versions and whatever else, racing forward without actually checking anything.

    Ok, I understand that, but is there a way to fix network manually? What settings should i set to get vm vorking? I couldn't see any problems from the first approach - there is eth0 ( ens3) interface up, there is a route to gateway xxx.xxx.xxx.1
    I can ping nearby nodes, but for gateway response is Network host unreachable.

  • VirMachVirMach Hosting Provider

    @Mumbly said:
    @VirMach that's misunderstanding now :)
    I didn't comment or criticized Virmach migrations here, but discussed @yoursunny's suggestion (well, he made a few good suggestions but this one I feel like wasn't the best one) about 24 hours migration queve in the future :)

    I understand, I'm just adding onto it that I agree with you and it's not user friendly and I'm also saying that unfortunately our current situation is similar and also not how we intended for it to be coded by the developer. I did read what @yoursunny suggested and that definitely has its benefits but our ideal version would be immediate.

    I remember we improved our script to minimize downtime to only something like an hour per VM and had that concept working pretty well, but it didn't work out when you're planning for efficiency and many servers.

    How the project was for the developer that never completed it:

    • Customer is eligible to use it when the server is queued for migration, and later on it would be open to everyone (the latter being the Ryzen to Ryzen idea.)
    • Customer receives one credit per term length, as in if they could cancel and re-order, then they're eligible for it. Therefore, naturally customer on first month of service wouldn't be able to abuse it.
    • Migration is queued in a batch, but the batch gets processed more immediately. It essentially would only wait for the right load conditions and be throttled by quantities of requests. If there are instantly 1,000 requests, then it could theoretically naturally take 24 hours.
    • Since AFAIK SolusVM does not have an API to run the script that completes a migration, AKA, marks it in the database as being on a new node, our idea was that it'd power up a new service, then replace the details of the existing service on WHMCS.
    • Old service gets marked for deletion but not immediate deletion in case anything goes wrong, so the data is still there for a few days. More aggressive pruning may happen during peak usage, it'd essentially have let's say a 200GB pool where it stays for a week, and anything past that maybe only a day. This also allows for an easy revert button to be worked in later in customer regrets his decision, instead of contacting us and instead of moving it twice.

    It'd have been pretty nice if we had that ready in time for these migrations so it instead didn't end up as the vague day-long periods.

    Thanked by (2)Mumbly FrankZ
  • Trying to figure out from network status and backlog here: Should FFME002 be operational? :p
    Have never gotten it to work here ... :# :3

    Troubleshooter reports:
    Main IP pings: false
    Node Online: false
    Service online: offline
    Operating System: linux-ubuntu-16.04-server-x86_64-minimal-latest
    Service Status:Active

  • @flips said:
    Trying to figure out from network status and backlog here: Should FFME002 be operational? :p
    Have never gotten it to work here ... :# :3

    pokes FFME002 with a stick nope, still dead. has been for a while, worked for like most of a day after i migrated there then dead :p

    Thanked by (1)FrankZ
  • I see new Los Angeles Ryzen network better... 10k in traffic vs 100k in Atlanta

    Thanked by (1)VirMach
  • Something strange about FFME003 network config of my vm. If i set network interface to dhcp, i receive correct ip address (xxx.xxx.163.xxx) but from dhcp server of another subnet xxx.xxx.162.2. And ip address i receive is from the same subnet as FFME004 vm. Is this working as intended? If i restore network config from control panel, i get the same ip as static, excluding missing symlink to resolvconf.

  • @VirMach said:

    • Customer receives one credit per term length, as in if they could cancel and re-order, then they're eligible for it. Therefore, naturally customer on first month of service wouldn't be able to abuse it.

    Most services are paid annually.
    One migration per year is not enough.
    That's why I suggested once per month.

    Some starter credits should be granted on the current services, because:

    • Some services have been auto-migrated to undesirable locations, such as Amsterdam to Frankfurt.
    • Looking glass nodes are inoperable, so that for all the migration done so far, the chosen location may be unsuitable for the needs.

    Once all the looking glass nodes are up, can we at least have 2~3 credits per year?
    That's a lot more flexible than only one credit per year, in case the network condition deteriorates in the middle.

    Thanked by (1)FrankZ
  • Seems many JP VMs got their IP changed without prior notice

  • edited July 2022

    @realEthanZou said:
    Seems many JP VMs got their IP changed without prior notice

    Indeed. No email, and no information I could find on the Network Status page.

    While I understand that a host can face many challenges that cannot be predicted or immediately explained, this is not one of such situations. And the lack of a warning and the lack of information about why the change happened (Was it intentional and is it to stay, or was was it just a configuration mistake that will be reverted?) is a big fat NO NO in my book.

  • What tha F is going in with FFME? Migrated from FFME004 yesterday for 3 bucks fee, got the same FFME004, but online and working, today it's totally offline - no boot, no VNC, nothing. And no information about what's going on. I was waiting patiently for two weeks, created zero tickets, but now i have to give up and leave as soon as i could get my data from FFME003. Honestly, even my home server with my hobbyist approach does have less downtime and more reliability, and migrates faster.

  • edited July 2022

    While our Control Panel still shows 0 IPv4 addresses, we can finally get back in at the assigned IPv4 address and also through our private VPN. So progress has been made at least on the RYZE.SEA-Z002.VMS node in Seattle. We'll see if it stays up.

  • VirMachVirMach Hosting Provider

    We're ramping up the abuse script temporarily. If you're at around full CPU usage on your VPS for more than 2 hours, it'll be powered down. We have too many VMs getting stuck on OS boot and negatively affecting others at this time, due to the operating systems getting stuck on boot for some after the Ryzen change. I apologize in advance for any false positives but I do want to note that it's technically within our terms for 2 hours of that type of usage to be considered abuse, we just usually try to be much more lenient.

    @realEthanZou said:
    Seems many JP VMs got their IP changed without prior notice

    @rhinoduck said:

    @realEthanZou said:
    Seems many JP VMs got their IP changed without prior notice

    Indeed. No email, and no information I could find on the Network Status page.

    While I understand that a host can face many challenges that cannot be predicted or immediately explained, this is not one of such situations. And the lack of a warning and the lack of information about why the change happened (Was it intentional and is it to stay, or was was it just a configuration mistake that will be reverted?) is a big fat NO NO in my book.

    We did send out emails, but it's possible they did not all send due to SolusVM also being overloaded around that time. Please also check your spam box, we sent these directly from SolusVM.

  • VirMachVirMach Hosting Provider

    @yoursunny said:

    @VirMach said:

    • Customer receives one credit per term length, as in if they could cancel and re-order, then they're eligible for it. Therefore, naturally customer on first month of service wouldn't be able to abuse it.

    Most services are paid annually.
    One migration per year is not enough.
    That's why I suggested once per month.

    Some starter credits should be granted on the current services, because:

    • Some services have been auto-migrated to undesirable locations, such as Amsterdam to Frankfurt.
    • Looking glass nodes are inoperable, so that for all the migration done so far, the chosen location may be unsuitable for the needs.

    Once all the looking glass nodes are up, can we at least have 2~3 credits per year?
    That's a lot more flexible than only one credit per year, in case the network condition deteriorates in the middle.

    Well initially the way it's going to work is there will be a period of time where you're allowed to "Ryzen Migrate" to your desired location (without data) as everyone lands in their desired location. It'll be announced here and on OGF, as well as most likely an "Announcement" on our website and probably a 1-2 week period where this can be done by everyone eligible.

    The credits I described are for a system not yet coded.

  • VirMachVirMach Hosting Provider

    @rhinoduck said: Indeed. No email, and no information I could find on the Network Status page.

    I'll make a network status page for it since it seems a lot of the emails failed to send.

    Thanked by (2)FrankZ skorous
  • VirMachVirMach Hosting Provider

    @Papa said:
    What tha F is going in with FFME? Migrated from FFME004 yesterday for 3 bucks fee, got the same FFME004, but online and working, today it's totally offline - no boot, no VNC, nothing. And no information about what's going on. I was waiting patiently for two weeks, created zero tickets, but now i have to give up and leave as soon as i could get my data from FFME003. Honestly, even my home server with my hobbyist approach does have less downtime and more reliability, and migrates faster.

    FFME004, we found ECC error, and the settings also dropped off again. Memory swap fixed FFME004, we couldn't send out migration emails in time because xTom worked very quickly to get this replaced. The setting drop-off caused a disk to drop and node was online but VMs were not booting. That has been resolved. I'm checking it again to see if settings stick, if they don't there might be another reboot which we'll create network status for but hopefully this will be stable moving forward.

    FFME has had 3 network status, many updates here, on OGF, and probably a fair share of emails. It can be considered an ongoing issue until they prove themselves by remaining online for more than 2 days.

  • VirMachVirMach Hosting Provider
    edited July 2022

    Settings stuck on FFME004 but I'm pretty sure I've said that once before. There's zero information on this but there's been constant kernel bugs regarding these fixes on Linux. I can't rewrite the Linux kernel right now so until Linux figures out how it's going to treat these problems I don't know what else to do about it.

    These issues have come on gone ever since NVMe SSDs existed, you can search online since around 2015. If you just look at Linux bug trackers it seems like every version fixes one thing and breaks another. I'll have to come up with some kind of kernel update plan moving forward where we try to mitigate the issues from re-appearing. But of course the solution can't just be to stay on the same version for 5 years.

    We're using literal copies of the same nodes in FFM... in Tokyo, and they're not having the same problems with the only difference being kernel versions.

    Thanked by (3)FrankZ tototo kheng86
  • @NerdUno said:
    While our Control Panel still shows 0 IPv4 addresses, we can finally get back in at the assigned IPv4 address and also through our private VPN. So progress has been made at least on the RYZE.SEA-Z002.VMS node in Seattle. We'll see if it stays up.

    Spoke too soon. Status back to dead in the water this afternoon.

  • @VirMach said:
    Settings stuck on FFME004 but I'm pretty sure I've said that once before. There's zero information on this but there's been constant kernel bugs regarding these fixes on Linux. I can't rewrite the Linux kernel right now so until Linux figures out how it's going to treat these problems I don't know what else to do about it.

    These issues have come on gone ever since NVMe SSDs existed, you can search online since around 2015. If you just look at Linux bug trackers it seems like every version fixes one thing and breaks another. I'll have to come up with some kind of kernel update plan moving forward where we try to mitigate the issues from re-appearing. But of course the solution can't just be to stay on the same version for 5 years.

    We're using literal copies of the same nodes in FFM... in Tokyo, and they're not having the same problems with the only difference being kernel versions.

    My VM in FFME004 seems fine now, but I have a couple of VMs FFME005 and FFME006 having Status "Offline", can't bootup, can't reinstall OS. Hope you can help, thanks! Ticket #754039

  • @VirMach said:
    We're ramping up the abuse script temporarily. If you're at around full CPU usage on your VPS for more than 2 hours, it'll be powered down. We have too many VMs getting stuck on OS boot and negatively affecting others at this time, due to the operating systems getting stuck on boot for some after the Ryzen change. I apologize in advance for any false positives but I do want to note that it's technically within our terms for 2 hours of that type of usage to be considered abuse, we just usually try to be much more lenient.

    Boot loop after migrating to a different CPU or changing to a different IP is not abuse.
    Customer purchased service on a specific CPU and a specific IP that are not expected to change.
    The kernel and userland could have been compiled with -march=native so that it would not start on any other CPU.
    The services could have been configured to bind to a specific IP, which would cause service restart loop if the IP disappeared.

    The safest way is not automatically powering on the service after the migration.
    The customer needs to press Power On button themselves and then fixes the machine right away.

    Running -march=native code on an unsupported CPU triggers undefined behavior.
    Undefined behavior means anything could happen, such as pink unicorn appearing in VirMach offices, @deank stopping to believe in the end, or @FrankZ receiving 1000 free servers.
    The simple act of automatic powering on a migrated server could cause these severe consequences and you don't want that.

    Thanked by (3)kheng86 AlwaysSkint FrankZ
  • VirMachVirMach Hosting Provider

    @NerdUno said:

    @NerdUno said:
    While our Control Panel still shows 0 IPv4 addresses, we can finally get back in at the assigned IPv4 address and also through our private VPN. So progress has been made at least on the RYZE.SEA-Z002.VMS node in Seattle. We'll see if it stays up.

    Spoke too soon. Status back to dead in the water this afternoon.

    This has an issue with a software getting stuck and duplicating its process over and over until it overloads and we have to reboot it. We made some changes, if it happens again we'll try to catch it earlier this time to avoid a reboot.

  • VirMachVirMach Hosting Provider

    @kheng86 said: but I have a couple of VMs FFME005 and FFME006 having Status "Offline", can't bootup, can't reinstall OS. Hope you can help, thanks!

    Were these always offline after Ryzen Migrate button?

  • VirMachVirMach Hosting Provider

    @yoursunny said:

    @VirMach said:
    We're ramping up the abuse script temporarily. If you're at around full CPU usage on your VPS for more than 2 hours, it'll be powered down. We have too many VMs getting stuck on OS boot and negatively affecting others at this time, due to the operating systems getting stuck on boot for some after the Ryzen change. I apologize in advance for any false positives but I do want to note that it's technically within our terms for 2 hours of that type of usage to be considered abuse, we just usually try to be much more lenient.

    Boot loop after migrating to a different CPU or changing to a different IP is not abuse.
    Customer purchased service on a specific CPU and a specific IP that are not expected to change.
    The kernel and userland could have been compiled with -march=native so that it would not start on any other CPU.
    The services could have been configured to bind to a specific IP, which would cause service restart loop if the IP disappeared.

    The safest way is not automatically powering on the service after the migration.
    The customer needs to press Power On button themselves and then fixes the machine right away.

    Running -march=native code on an unsupported CPU triggers undefined behavior.
    Undefined behavior means anything could happen, such as pink unicorn appearing in VirMach offices, @deank stopping to believe in the end, or @FrankZ receiving 1000 free servers.
    The simple act of automatic powering on a migrated server could cause these severe consequences and you don't want that.

    We're ramping up the abuse script. It's what it is called. I didn't say boot loop after migrating is abuse.

    Abuse script will just power it down, not suspend. I don't see the harm in powering down something stuck in a boot loop. I was just providing this as a PSA for anyone reading who might be doing something else not related that's also using a lot of CPU and for general transparency, we're making the abuse script more strict to try to power down the ones stuck in the boot loop automatically more quickly.

    The safest way is not automatically powering on the service after the migration.
    The customer needs to press Power On button themselves and then fixes the machine right away.

    Not possible, we have to power up all of them to fix other issues. Otherwise we won't know the difference between one that's stuck and won't boot and others. Plus many customers immediately make tickets instead of trying to power up the VPS after it goes offline so in any case having them powered on has more benefits than keeping them offline.

    Thanked by (1)AlwaysSkint
  • @VirMach said:

    @kheng86 said: but I have a couple of VMs FFME005 and FFME006 having Status "Offline", can't bootup, can't reinstall OS. Hope you can help, thanks!

    Were these always offline after Ryzen Migrate button?

    The "Migration" button was not used. Has been always offline after the migration. It happened after the planned migration from AMS to FFE

  • vyasvyas OGSenpai
    edited July 2022

    My VPS from NL ==> FFxxx ==> FF?? seems to be up and running after experiencing disk errors, and a long-ish downtime.

    • and a brand new YABS coming up just to make @cybertech happy.

    Side benefits of being a @Virmach customer : one can develop a high threshold for patience (or patscience provider from Romania who-shall-not-be-named used to say).
    And that's not a criticism!

    Under US $ 10 a year is way cheaper than paying for meditation app/ Yog classes

    Cheers

  • AlwaysSkintAlwaysSkint OGSenpai
    edited July 2022

    Fantastic!

    Your IP 92.18.90.xxx has been banned

    Ban Reason: Banned for 34 login attempts.
    Ban Expires: 07/08/2022 (21:00)
    

    My multi-IP NY is down, likely moved and I wanted to find out what the score was. :'(

    [EDIT1:]
    One VPN later (ironically in NYC) and yep my triple IP VPS is fubar. Pointless changing main IP without the others being available. (NYCB018)
    Awaiting non-patiently @VirMach

    [EDIT2:]
    At least the CHI is still there, for now. :|

    It wisnae me! A big boy done it and ran away.
    NVMe2G for life! until death (the end is nigh)

  • IIRC my FFME0002 node was online for a while after hitting the migrate button. But then it's been dead ever since ... :p :s

  • @flips said:
    IIRC my FFME0002 node was online for a while after hitting the migrate button. But then it's been dead ever since ... :p :s

    Yep, same. I poke mine with a stick occasionally but it hasn't really done anything of note since the first day.

  • @VirMach said: Before I change LAX to also have the fine-tuned NIC driver configuration does anyone who has both LAX and somewhere else functional notice one performing better than the other? On my end, LAX is around 70% cleaner.

    Finally got my VPS on LAXA031 connected to the Internet, by clicking either fix networking or reinstallation buttons. Checking with "sar -n DEV 2 5", the result is far better (lower) on LAX (<100) than TYO(~1000 or more, even on the relatively stable node I previously mentioned). Nodes that behaving poorly like TYOC029 and TYOC002S could benefit a lot if the fix is applied on them, especially once you added the 10gbps switch to the storage node.

  • fanfan
    edited July 2022

    Yabs on LA node LAXA031, looks good:

    Sat 09 Jul 2022 02:01:37 PM CST
    
    Basic System Information:
    ---------------------------------
    Uptime     : 0 days, 0 hours, 0 minutes
    Processor  : AMD Ryzen 9 5950X 16-Core Processor
    CPU cores  : 2 @ 3393.624 MHz
    AES-NI     : ✔ Enabled
    VM-x/AMD-V : ✔ Enabled
    RAM        : 1.8 GiB
    Swap       : 975.0 MiB
    Disk       : 43.0 GiB
    Distro     : Debian GNU/Linux 11 (bullseye)
    Kernel     : 5.10.0-13-amd64
    
    fio Disk Speed Tests (Mixed R/W 50/50):
    ---------------------------------
    Block Size | 4k            (IOPS) | 64k           (IOPS)
      ------   | ---            ----  | ----           ----
    Read       | 285.18 MB/s  (71.2k) | 1.03 GB/s    (16.1k)
    Write      | 285.93 MB/s  (71.4k) | 1.03 GB/s    (16.2k)
    Total      | 571.11 MB/s (142.7k) | 2.06 GB/s    (32.3k)
               |                      |
    Block Size | 512k          (IOPS) | 1m            (IOPS)
      ------   | ---            ----  | ----           ----
    Read       | 1.36 GB/s     (2.6k) | 2.05 GB/s     (2.0k)
    Write      | 1.43 GB/s     (2.8k) | 2.18 GB/s     (2.1k)
    Total      | 2.80 GB/s     (5.4k) | 4.23 GB/s     (4.1k)
    
    iperf3 Network Speed Tests (IPv4):
    ---------------------------------
    Provider        | Location (Link)           | Send Speed      | Recv Speed
                    |                           |                 |
    Clouvider       | London, UK (10G)          | 687 Mbits/sec   | 292 Mbits/sec
    Online.net      | Paris, FR (10G)           | 753 Mbits/sec   | 381 Mbits/sec
    Hybula          | The Netherlands (40G)     | 741 Mbits/sec   | 531 Mbits/sec
    Uztelecom       | Tashkent, UZ (10G)        | 656 Mbits/sec   | 207 Mbits/sec
    Clouvider       | NYC, NY, US (10G)         | 879 Mbits/sec   | 523 Mbits/sec
    Clouvider       | Dallas, TX, US (10G)      | 734 Mbits/sec   | 247 Mbits/sec
    Clouvider       | Los Angeles, CA, US (10G) | 928 Mbits/sec   | 910 Mbits/sec
    
    Geekbench 5 Benchmark Test:
    ---------------------------------
    Test            | Value
                    |
    Single Core     | 941
    Multi Core      | 1691
    Full Test       | https://browser.geekbench.com/v5/cpu/15906668
    
This discussion has been closed.