BuyVM downtime

hey · April 2022

Did anyone notice buyvm Las Vegas was down for about 4 hours today?

I don't have any important thing there, just curious about what happens

edit location

Xenos · April 2022

When did BuyVM add LA?

hey · April 2022

@Xenos said:
When did BuyVM add LA?

Opps, It's Las Vegas.

armandorg · April 2022

That is completely true, Las Vegas

Neoon · April 2022

its up, likely just your node.

lapua · April 2022

was it ever possible to buy something from buyvm? every time i went to their website, it said: sold out!

Chievo · April 2022

@lapua said:
was it ever possible to buy something from buyvm? every time i went to their website, it said: sold out!

Yeah it is possible . you must receive an email about availability of their services. It is on their website just put your email there

deank · April 2022

Gotta get in a line and be quick about it to buy anything from them.

Francisco · April 2022

Sorry about that.

We had a single storage node in Vegas burp causing the whole block storage cluster to hang. When it hangs, users VM's will usually "pause" and then resume once things are moving again.

Vegas runs a fairly old version of things, with NY/LU/MIA being newer builds. Our current setup doesnt' give us any sort of node level redundancy, though, so if a node locks up/reboots/whatever, it's going to crash whatever VM's are feeding from it.

In the next few days we'll begin live trials on our Ceph cluster. Our own tests look pretty solid and will give us the option to offer Object Storage (S3) if we wanted to.

It'll take quite some time to migrate users into Ceph, but I'm fairly sure I can do the entire thing while users are still running and without a single disruption or byte lost.

Francisco

Mamyyy · April 2022

Ah that explain what I can't connect to ssh, but my box still retained its uptime when I came back later.

willie · April 2022

Mine doesn't seem to have been down at all. I had an ssh session open last night and it is still connected. Storage is still mounted too. I'm used to having to remount it when anything happens.

Lee · April 2022

@Francisco said: I can do the entire thing while users are still running and without a single disruption or byte lost.

Just copying for the comp claim later.

deank · April 2022

Ohh, Fran's gonna get sued.

Francisco · April 2022

@willie said:
Mine doesn't seem to have been down at all. I had an ssh session open last night and it is still connected. Storage is still mounted too. I'm used to having to remount it when anything happens.

That's if a storage node you're attached to reboots, that actually kills active connections so you'll go read-only.

I like the setup we have, it's pretty easy to maintain, but the lack of wider redundancy is annoying.

@Lee said:

@Francisco said: I can do the entire thing while users are still running and without a single disruption or byte lost.

Just copying for the comp claim later.

Thankfully it just uses Libvirts live migrations. We literally rebuilt all of LUX slabs...twice... last year due to XFS chewing its face off. Users were unaware it was happening minus the lack of stock.

Francisco

seanho · April 2022

@Francisco said:
In the next few days we'll begin live trials on our Ceph cluster. Our own tests look pretty solid and will give us the option to offer Object Storage (S3) if we wanted to.

It'll take quite some time to migrate users into Ceph, but I'm fairly sure I can do the entire thing while users are still running and without a single disruption or byte lost.

Francisco

That's exciting! I assume you have enough nodes that you can lose a few without going HEALTH_WARN; the stress of heavy scrubbing on a live cluster can quickly cause cascading issues. Some of us still remember ZXHost....

Francisco · April 2022

@seanho said:

@Francisco said:
In the next few days we'll begin live trials on our Ceph cluster. Our own tests look pretty solid and will give us the option to offer Object Storage (S3) if we wanted to.

It'll take quite some time to migrate users into Ceph, but I'm fairly sure I can do the entire thing while users are still running and without a single disruption or byte lost.

Francisco

That's exciting! I assume you have enough nodes that you can lose a few without going HEALTH_WARN; the stress of heavy scrubbing on a live cluster can quickly cause cascading issues. Some of us still remember ZXHost....

Shouldn't be a problem I suspect ZX was flying by the seat of his pants and barely had enough capacity to cover what he was offering, nevermind spare. There's a real chance he had 'min_size' == 1, basically R0.

Francisco

AshUk · April 2022

@Francisco said:

@seanho said:

@Francisco said:
In the next few days we'll begin live trials on our Ceph cluster. Our own tests look pretty solid and will give us the option to offer Object Storage (S3) if we wanted to.

It'll take quite some time to migrate users into Ceph, but I'm fairly sure I can do the entire thing while users are still running and without a single disruption or byte lost.

Francisco

That's exciting! I assume you have enough nodes that you can lose a few without going HEALTH_WARN; the stress of heavy scrubbing on a live cluster can quickly cause cascading issues. Some of us still remember ZXHost....

Shouldn't be a problem I suspect ZX was flying by the seat of his pants and barely had enough capacity to cover what he was offering, nevermind spare. There's a real chance he had 'min_size' == 1, basically R0.

Francisco

For what it’s worth I/ZX was using erasure coding. 6-2 if I remember correctly. Was in the process of adding a new node (storage) hit a nasty bug that caused some extents in the OSD journal to be miss set during the rebalance.

Was fine till an OSD needed to restart and playback the journal, worked with the CEPH dev’s to fix the issue at the time.

However by that point enough OSD/PG shards where corrupt and pretty much every RBD was impacted hence toasted FS.

seanho · April 2022

Hey you're here, Ash! I did enjoy ZX while it lasted, and I knew you tried your best to recover.

Falzo · April 2022

@AshUk said:

I/ZX

oh, hi there! good to see you alive and standing... miss my storage boxes still, however it all could have ended better I guess ;-)

How's everything going? any plans to come back to hosting? probably people quickly get their forks out, so be careful ... all the best!

deank · April 2022

Hosting ain't worth it. Stay away.

AshUk · April 2022

@Falzo said:

@AshUk said:

I/ZX

oh, hi there! good to see you alive and standing... miss my storage boxes still, however it all could have ended better I guess ;-)

How's everything going? any plans to come back to hosting? probably people quickly get their forks out, so be careful ... all the best!

Hey!

Couple months after I actually had a few people reaching out to me asking if I was going to restart / offer something as they had a need for XXTB and couldn’t find anywhere else.

So for past year or so have run a small operation providing to just word of mouth.

0 advertisement or anything, after that I realised having 100’s of clients paying $.$$ when something goes wrong it’s a huggeee headache and have not much more hair to loose…

Feel free to drop me a message if you ever need any advise / anything, don’t want to prop up this post anymore.

But yeah I don’t think I’ll be doing anything similar again anytime soon. Maybe launch something on the mid/higher end of $ and not aiming for the bottom.

BuyVM downtime

Comments