[2022] ★ VirMach ★ RYZEN ★ NVMe ★★ The Epic Sales Offer Thread ★★

199100102104105277

Comments

  • VirMachVirMach Hosting Provider
    edited September 2022

    @VirMach said:
    I have some ideas for TYOC002S and getting the spikes down but none of them involve suspending abusers. There's some configuration modifications that can be made after I study the traffic patterns further and potential hardware changes which would not happen any time soon (and I'm not sure of a good way to implement it with SolusVM.)

    I might as well elaborate on these ideas. These are only ideas in my head, I have not yet looked into their realistic possibilities. Parts of it could be not very well thought out, I haven't written it anywhere, planned it anywhere, or anything. Just in my head.

    [1] There are some minor tuning we can do when it comes to some RAID settings/configurations where it can skew the performance to where it benefits more in the current environment.

    [2] We still have NVMe SSDs on the node and could place larger ones to where it would actually be more effective, and add in some type of caching. I only have really researched LVM caching but SolusVM does not support this automatically which means we'd have to manually configure each virtual server's LVM and then customers would probably have to actually effectively utilize it so I'm throwing that one out the window. I have to see what else we can do without having to rebuild the entire array. On the newer versions of these controllers, they took away SSD caching but it may be possible to do it in some other way with or without the controller. Once again, this is just an idea, it'll require a lot of planning to actually do this, including hardware swaps and configuring the caching. The idea is that caching could potentially handle a lot of the stuff hard drives not meant for and actually reduce the overall spikes.

    [3] We still have NVMe SSDs so a second thing we could do here is assign people some NVMe space instead of doing caching. This again requires people effectively utilize it but it might be easier just sending out announcements for this telling people to use the NVMe for certain things. It'll be a much bigger headache actually initially setting it up though and getting people to move certain things over to the NVMe and for certain use cases I'm sure it could be more difficult. The amount of SSDs we have right now is negligible so realistically this would still involve a hardware upgrade.

    [4] Board swap. We could try to go with something other than a 5950X with effectively more total power. I assume plenty of people will complain since their core clock would go down. It'd also cost a lot and a lot can go wrong, but this would be a situation where we basically say "let the current weird/extreme usage continue, we'll just add more power." The problem is though that the RAID controller is still the same and I'd have to investigate further, like way way further to ensure that in this case it would actually be a significant improvement so this would possibly be evaluated after at least #1 is done. Because maybe we'd increase the processing power and then instead of 15% to MAX we go to another range but then effectively still have the controller getting bottlenecked and it having the same performance, whether or not some arbitrary number ends up looking better.

  • edited September 2022

    @VirMach said:

    almost everyone in the Tokyo location is using the service in a manner that causes abrupt and unpredictable spikes, causing the CPU, within a split second, to range anywhere from 15% and max.

    You mean VPN use? Setting up VPNs on storage node doesn't make a lot of sense to me.

    If you want to be around people not doing that, request a transfer to Los Angeles or NYC. If you want to remain in Tokyo and believe we should kick off 90% of the people on there, let me know what you're doing on it and private message me your IP so I can determine if you are part of the 90% or the 10% and we'll go from there.

    Nah. I'm good. Why LAX (100ms+) or NYC( 160ms+) when TYO has a latency of less than 5ms for me? ISP caps international traffic speed (single thread to US capped at 20Mbps compared to 1Gbps domestic) as well.
    My problem is that random inaccessibility on weekend and intermittent irresponsiveness while typing something on SSH.
    More likely a CPU issue rather than IO issue.

    (edit) Getting rid of the abusers may have brought down the OVERALL usage, but it has essentially done NOTHING when it comes to the other issue. So right now it's 15% to max. Previously with abusers it may have been something like 30-40% to max. Getting rid of the abusers will only address the problem where on Saturday the node locks up.

    So you're saying this problem is peculiar to Tokyo storage VPS only and there's no way to solve it?
    Is this the reason most Tokyo storage VPSes are not yet provisioned? not until you figure out how to deal with the CPU steal/spike issue?

  • @VirMach said: If you want to be around people not doing that, request a transfer to Los Angeles or NYC.

    I just requested this since originally my VPS was in NY but somehow got ryzen migrated to Tokyo

  • The service status says TYOC040 (solved). But my TYOC040 vps is still offline. It's been a month. Do I need to submit a work order?

  • edited September 2022

    Work orders can be submitted on OGF.

  • VirMachVirMach Hosting Provider

    @foitin said: You mean VPN use? Setting up VPNs on storage node doesn't make a lot of sense to me.

    I mean small operation sizes, and in big bursts over a small period of time, and random, but obviously more overall quantity of spikes during peak times. A lot of different use cases can fit into that.

    @foitin said: So you're saying this problem is peculiar to Tokyo storage VPS only and there's no way to solve it?

    If the "problem" we're speaking of is the part where the CPU goes from 15% to 100% at any given part of a second, which means IO usage is doing that since the IO usage is what's causing the CPU to do that, correct. Outside of maybe the ideas I have to start making a potential impact.

    @foitin said: Is this the reason most Tokyo storage VPSes are not yet provisioned? not until you figure out how to deal with the CPU steal/spike issue?

    Okay first of all, most Tokyo servers have been provisioned last I checked. So I assume you mean the ones that were in excess of the initial amount promised on that sale. It's of course possible a lot more people ordered since then as they could just force order it and it was also listed as pre-order for some time as well.

    But yes this is one small portion of the many reasons on why the second storage node hasn't been sent, because I've been thinking about possibly modifying it before sending it to include one of the hardware related ideas I mentioned.

  • LAXA014 has been down for a week now,

    Want free vps ? https://microlxc.net

  • VirMachVirMach Hosting Provider
    edited September 2022

    @codelock said:
    LAXA014 has been down for a week now,

    More than a week effectively. We know. LAX connectivity issue and other things caused us to not be able to do it when we last wanted to do it (migrating people off.) Now we have these IP changes, other node issues, and we need to get SJCZ005 people off first because that node is in a much worse state so maybe tomorrow at the earliest we can address LAXA014.

    (edit) It's technically not down by the way just unusable, just mentioning that as a side note.

  • @VirMach said:
    If you want to be around people not doing that, request a transfer to Los Angeles or NYC. If you want to remain in Tokyo and believe we should kick off 90% of the people on there, let me know what you're doing on it and private message me your IP so I can determine if you are part of the 90% or the 10% and we'll go from there.

    I know that wasn't directed at me, but can I take you up on that too? I picked Tokyo because it was the new hotness and I wanted a different physical location than my dedicated server, but now that my new dedi is in San Jose, if I can get better neighbors and ping while freeing up some resources for people that actually need to be in Tokyo, I'd be all over that. I would've just paid for a without-data migration and not bothered anyone about it, but unsurprisingly that's not an option for storage servers.

  • @VirMach said:

    I mean small operation sizes, and in big bursts over a small period of time, and random, but obviously more overall quantity of spikes during peak times. A lot of different use cases can fit into that.

    Speaking of peak time, I'm seeing CPU steal this at 3 AM (which should be similar to most potential abusers?)
    So it must have been caused by the non-optimized backup script named qBittorrent?

    But yes this is one small portion of the many reasons on why the second storage node hasn't been sent, because I've been thinking about possibly modifying it before sending it to include one of the hardware related ideas I mentioned.

    and due to the 10Gbps port delay? Well, 10Gbps might encourage more non-optimized script users I suppose?

  • Virmach el diablo.

    Thanked by (1)chimichurri
  • VirMachVirMach Hosting Provider
    edited September 2022

    Hivelocity is not announcing the IP blocks because apparently announcing anything larger than a /22 (we were announcing a /20 and a /22 in one LOA this time to speed up the process...) kicks in some policy they made up where they have to send an email to the abuse contact for the blocks. They let us know about 14 hours after we provided the revised LOA. A supervisor may or may not look at it and reconsider.

    I'm doing the math just in case emergency migrations are required. I have servers I can take to QuadraNet for LAX, but Hivelocity Chicago had all the chunky nodes so only about half max can fit in QuadraNet Chicago. Tampa obviously has no alternative and we already lost one server in QN Miami and a few others in ATL.

    QuadraNet and xTom both announced the LOA within 0-2 hours of us sending it in so it shouldn't be a problem re-allocating them outside of getting the IP leasers to reply, but we can always get more blocks off IPXO. And for reference, Dedipath did everything almost immediately, they pointed out the LOA issue, and sent it off to get processed immediately but since they're one step below INAP it's still being processed (but at least it's actually being processed.)

    I just found out about this. We're not sending any emergency emails out yet until I have it worked out and a timeline but pretty much unless Hivelocity makes it known that they're willing to be reasonable here we have no other choice outside of several days outage. We could maybe let people decide if I can work out an efficient way to do that. And before we get attacked for not leaving a huge padding on every end in case a provider decided to bring up a policy at the last minute, we've been working on this for 2 weeks and there's been maybe around a 1 day delay (actually I just did the math, it's 15 hours) that can be attributed to us directly, and clearly multiple other providers were able to do the LOA as an LOA is usually done without adding in their own red tape, so it is what it is.

    So anyway this is a "potential not smooth sailing" unofficial alert until they make their decision.

    (edit) Of course I'm still letting the IP provider know they've done that and to check but they're not exactly 24x7 support either.

    Supervisor just replied, it sounds like they're willing to do it.

  • @imok said:

    Virmach el diablo.

    It's almost shocking how smoothly this went for both my vpses. Part of me expect for things to go terribly wrong and 2 - 3 weeks of downtime as minimum. But nothing like that, both NL and DE servers are up and running with new IPs.

    Thanked by (1)VirMach
  • @VirMach, is NYCB033X done? can I use new i.p?

  • edited September 2022

    Tokyo Maintenance looks like begin at 09/28.
    In fact it is a vase 09/29.

  • VirMachVirMach Hosting Provider

    @tenpera said:
    @VirMach, is NYCB033X done? can I use new i.p?

    https://billing.virmach.com/serverstatus.php

    Not yet.

  • VirMachVirMach Hosting Provider

    @WeiHo said:
    Tokyo Maintenance looks like begin at 09/28.
    In fact it is a vase 09/29.

    IP changes got pushed back into it. I've still been working on them but not at the speed initially planned. It should still hopefully remain within the 72 hour maintenance window.

  • cybertechcybertech OGBenchmark King
    edited September 2022

    @VirMach said:

    @tenpera said:
    @VirMach, is NYCB033X done? can I use new i.p?

    https://billing.virmach.com/serverstatus.php

    Not yet.

    the serverstatus page is nicely populated (albeit not in a good way) that it deserves a proper domain like virmachstatus or virmachdownup or haveyoubeenvirmached , so all the real-time datacentre grade literature can be focused on it instead of updating here that may get lost in the pages.

    and then here we can get the juicy upcoming plans, expansions, deals, whatdoyouthinks that keep us hyped.

    I bench YABS 24/7/365 unless it's a leap year.

  • @VirMach said: If the "problem" we're speaking of is the part where the CPU goes from 15% to 100% at any given part of a second, which means IO usage is doing that since the IO usage is what's causing the CPU to do that, correct. Outside of maybe the ideas I have to start making a potential impact.

    Is high CPU usage for SATA I/O usual for that particular RAID card?
    Shouldn't scattered small reads just stall the requesting VM's virtio thread, as it waits in the controller's io-queue?

  • @VirMach said:

    @WeiHo said:
    Tokyo Maintenance looks like begin at 09/28.
    In fact it is a vase 09/29.

    IP changes got pushed back into it. I've still been working on them but not at the speed initially planned. It should still hopefully remain within the 72 hour maintenance window.

    You look optimistic.

  • edited September 2022

    @codelock said:
    LAXA014 has been down for a week now,

    Tokyo40 has been offline for a month.

  • @VirMach said:

    @dedicados said:
    @VirMach any update on my ticket?

    View Ticket #232870
    Subject: assigned ip not working on storage server

    If it's urgent make sure it's a priority department ticket. We're almost done catching up and completing all those by today/tomorrow.

    I have to add: If it's not urgent, also make a priority department ticket, otherwise your ticket will just be closed after a couple of weeks...> @cao666 said:

    @codelock said:
    LAXA014 has been down for a week now,

    Tokyo40 has been offline for a month.

    Nah, doesn't say anything. I'm on a host that isn't down but my VM doesn't have networkconnectivity to the outside world for over a month now. Can ping other hosts in the subnet, but can't ping the default gateway assigned by DHCP - even the newly assigned IP had this problem - hence I'm not able to actually do anything with the VM. According to the mail sent out it's the correct IP for the default gateway. Even tried a reinstall of the VM - nothing works.

    Ticket has been sent out with as much technical details as possible, and it's waiting in the "priority queue" for quite some time now.

  • @cybertech said:
    the serverstatus page is nicely populated (albeit not in a good way) that it deserves a proper domain like virmachstatus or virmachdownup or haveyoubeenvirmached , so all the real-time datacentre grade literature can be focused on it instead of updating here that may get lost in the pages.

    Agreed. A blog roll is fine for the all the gory details, but an at-a-glance dashboard with red and green lights would be great (and virmach.watch is just sitting there...)

  • @cybertech said: instead of updating here that may get lost in the pages.

    And it will because this time I hope we reach to page 1000

  • We are going to go to page #300 .

  • @VirMach I do not like to keep badgering on the status of my replacement Dedicated server. Its been over 7 weeks since shutdown and I have not heard back on getting a replacement. Do expect to get me a replacement or should I consider this to be a dud. Service was paid till Dec.

    Most of these were completed. I don't know what state yours is in right now, this isn't something where we could help you out more quickly here but I'm definitely going through these daily still and working to fix OS and IPMI requests mostly.

    I understand most were completed but apparently mine was not among one. Its now 8 weeks gone and I have still not received a replacement Dedicated server. If you do not know the status where I stand, then who would within VirMach I can reach out to? Initially I was provided a 1 or 2 week after shut down and put in a ticket for replacement. If you are not able to get me a replacement server, clearly state it here I will take my loses, move on not to bother or look back at VirMach services again.

  • @Calypso said:

    @VirMach said:

    @dedicados said:
    @VirMach any update on my ticket?

    View Ticket #232870
    Subject: assigned ip not working on storage server

    If it's urgent make sure it's a priority department ticket. We're almost done catching up and completing all those by today/tomorrow.

    I have to add: If it's not urgent, also make a priority department ticket, otherwise your ticket will just be closed after a couple of weeks...> @cao666 said:

    @codelock said:
    LAXA014 has been down for a week now,

    Tokyo40 has been offline for a month.

    Nah, doesn't say anything. I'm on a host that isn't down but my VM doesn't have networkconnectivity to the outside world for over a month now. Can ping other hosts in the subnet, but can't ping the default gateway assigned by DHCP - even the newly assigned IP had this problem - hence I'm not able to actually do anything with the VM. According to the mail sent out it's the correct IP for the default gateway. Even tried a reinstall of the VM - nothing works.

    Ticket has been sent out with as much technical details as possible, and it's waiting in the "priority queue" for quite some time now.

    My Tokyo40 vps is the same as what you said. The same network segment IP is online, but only mine is offline.

  • VirMachVirMach Hosting Provider

    @bula said: I understand most were completed but apparently mine was not among one. Its now 8 weeks gone and I have still not received a replacement Dedicated server. If you do not know the status where I stand, then who would within VirMach I can reach out to? Initially I was provided a 1 or 2 week after shut down and put in a ticket for replacement. If you are not able to get me a replacement server, clearly state it here I will take my loses, move on not to bother or look back at VirMach services again.

    I vaguely remember an initial long wait for a certain location or specification and then you reduced the requirements in some way and it was in the queue to get processed. If you want to speak about it here please confirm what we last said to you and when and confirm that it's not related to a specific OS request.

    As in you requested a server, it was never delivered, and I can try to take a look today to see what happened.

  • @VirMach said:
    I vaguely remember an initial long wait for a certain location or specification and then you reduced the requirements in some way and it was in the queue to get processed. If you want to speak about it here please confirm what we last said to you and when and confirm that it's not related to a specific OS request.

    As in you requested a server, it was never delivered, and I can try to take a look today to see what happened.

    VirMach, initially yes wanted west coast location with similar specs I had at CC was E3-1270, 2x 1TB HDD, 32GB RAM with 13IPs.
    According to Ticket these HW specs would take 1 to 2 weeks. However that passed and you mentioned West Coast location may not be possible along with offer limited to 1x 2TB HDD so I agreed to other US Datacenter location. I really do not like badgering on this issue every once a week. So if you are able to get me a replacement thats fine if not I just move on.

  • edited September 2022

    @VirMach said:

    @codelock said:
    LAXA014 has been down for a week now,

    It's technically not down by the way just unusable, just mentioning that as a side note.

    Now that's got to be reassuring for your customers.

This discussion has been closed.