"CPU Abuse": Understanding & follow-ups

TigersWay · August 2021

Hi,
I've just been hit by a "CPU Abuse" warning... And I would love to understand what happened to prepare everything into avoiding that message ever!

One of my box (KVM, 2 cores, if it matters), basically idling for nearly 2 years (1 simple instance of Syncthing + no-traffic websites), monitored with the HetrixTools agent (The Best!) has just been stopped because "constantly consuming 100-200% ( fully utilized 1-2 cores )".
1> Graphs from HT show nothing,
2> my personal checkup shows nothing either: ufw & failban clean, nothing in any log...

While I can find my way around, I'm not any kind of expert with linux: Debian 10 for that one. And I would appreciate any kind of lead, track, where-to-look stuff

Thanks for any light!

cybertech · August 2021

does the provider name start with

?

TigersWay · August 2021

@cybertech said:
does the provider name start with

?

No it does not
In fact, until yesterday (and it's probably only my fault) I was very pleased with that still unnamed provider!

cybertech · August 2021

well i had a similar issue with my provider, reinstalled a couple times fresh up with fail2ban as well, monitoring htop/top with no indications whatsoever (load constant at <0.10). a couple of tickets back and forth later, no longer received any automated warnings ever since, and my use case hasn't ever changed.

hopefully it works out for you!

seriesn · August 2021

@TigersWay said:
Hi,
I've just been hit by a "CPU Abuse" warning... And I would love to understand what happened to prepare everything into avoiding that message ever!

One of my box (KVM, 2 cores, if it matters), basically idling for nearly 2 years (1 simple instance of Syncthing + no-traffic websites), monitored with the HetrixTools agent (The Best!) has just been stopped because "constantly consuming 100-200% ( fully utilized 1-2 cores )".
1> Graphs from HT show nothing,
2> my personal checkup shows nothing either: ufw & failban clean, nothing in any log...

While I can find my way around, I'm not any kind of expert with linux: Debian 10 for that one. And I would appreciate any kind of lead, track, where-to-look stuff

Thanks for any light!

I feel like the culprit is us. Idk but spit balling.

Anything within the control panel usage history or high load issue?

TigersWay · August 2021

@cybertech said:
well i had a similar issue with my provider, reinstalled a couple times......
... and my use case hasn't ever changed.

Bad feeling at first, right? Their support seems always so sure.... I should believe them!

Not_Oles · August 2021

When the CPU suddenly gets pinned and you weren't doing much, it might be a software bug. I've been involved in a couple of situations where this has happened. Once when I was testing something for a good programmer. These bugs can be hard to catch and not at all related to anything specific you are doing beyond running the buggy program.

Maybe something as simple as asking the provider to turn the VPS on again and looking at the output of top might enable you to identify the bad process.

Maybe ask the provider to take a look inside or let you take a look before before turning the VPS off if it happens again?

Hope you get it solved! Best wishes from Mexico! 👍

TigersWay · August 2021

@seriesn said:
Anything within the control panel usage history or high load issue?

Have only network and HDD graphs. Not really useful. But you're right, I did not think about theirs

TigersWay · August 2021

@Not_Oles said:
Maybe something as simple as asking the provider to turn the VPS on again and looking at the output of top might enable you to identify the bad process.

Hope you get it solved! Best wishes from Mexico! 👍

You're so right. I thought they could even give a top just before they turned it off.
But I did restart, and of course nothing so far.

As much wishes from Thailand!

seriesn · August 2021

@seriesn said:

@TigersWay said:
Hi,
I've just been hit by a "CPU Abuse" warning... And I would love to understand what happened to prepare everything into avoiding that message ever!

One of my box (KVM, 2 cores, if it matters), basically idling for nearly 2 years (1 simple instance of Syncthing + no-traffic websites), monitored with the HetrixTools agent (The Best!) has just been stopped because "constantly consuming 100-200% ( fully utilized 1-2 cores )".
1> Graphs from HT show nothing,
2> my personal checkup shows nothing either: ufw & failban clean, nothing in any log...

While I can find my way around, I'm not any kind of expert with linux: Debian 10 for that one. And I would appreciate any kind of lead, track, where-to-look stuff

Thanks for any light!

I feel like the culprit is us. Idk but spit balling.

Anything within the control panel usage history or high load issue?

Actually just realized, we don't turn off. Just throttle. NVM not me.

Not_Oles · August 2021

I thought they could even give a top just before they turned it off.

Can they easily get in? Maybe you could add their ssh key?

Providers are hesitant to go inside because of respect for privacy. And if anything happens after they go inside they might be accused of causing the trouble. So we need to remember to give the providers a break, but maybe they'll help you if you ask.

TigersWay · August 2021

@seriesn said:

@seriesn said:

I feel like the culprit is us. Idk but spit balling.

Anything within the control panel usage history or high load issue?

Actually just realized, we don't turn off. Just throttle. NVM not me.

Oh, sorry, I did not realized!
Yes, I'm one of your family member, but no, I have NO trouble with you

For all readers, I usually have no problem to shame names, but in that case, I'm just trying to understand - and not reproduce - than to shame. So no name!

TigersWay · August 2021

@Not_Oles said:

I thought they could even give a top just before they turned it off.

Can they easily get in? Maybe you could add their ssh key?

Providers are hesitant to go inside because of respect for privacy. And if anything happens after they go inside they might be accused of causing the trouble. So we need to remember to give the providers a break, but maybe they'll help you if you ask.

I don't think I would have any trouble to do that of course..... But I have to be totally honest - even if I did know how it is to be "online support" - their lines clearly put the guilt on my side, or at least I got that feeling

cybertech · August 2021

@TigersWay said:

@cybertech said:
well i had a similar issue with my provider, reinstalled a couple times......
... and my use case hasn't ever changed.

Bad feeling at first, right? Their support seems always so sure.... I should believe them!

well i was just as sure too. with the vps being despicably cheap i really dislike bothering and wasting support time as well as mine, since its kind of an install and forget idler. ticket ended with "answered" , calm ever since. as its not the first VM from this provider to get automated warning, will avoid them in future due to the higher than normal levels of automated communication, and a rather long turnaround time for human interaction. not worth anybody's time for that

TigersWay · August 2021

@cybertech said:
..... not worth anybody's time for that

LOL You're probably right! But I have been very happy with them for 2 years, And I love to put my finger(s) where it hurts Somehow the best way for me not to re-produce.

willie · August 2021

Use sar to look at the cpu history. You may have to enable some logging for that--see the man page.

uptime · August 2021

The atop monitor can be useful for logging system resource usage

Pretty simple setup on debian:

apt install atop

vyas · August 2021

@uptime said:
The atop monitor can be useful for logging system resource usage

Pretty simple setup on debian:
apt install atop

That’s the command which @uptime suggested to me many, many moons ago on a different matter

chocolateshirt · August 2021

If you server were using HDD, then maybe high IO wait cause this..

Falzo · August 2021

switch providers. easy as that.

if shutting down/suspending is the only way a provider is able to deal with high CPU usage and they can't even provide more detailed information about what/when/how long, then that's simply BS.

a balanced system should always allow you to utilize 100% of the ressources you booked. if it is shared ressources the hostnode can throttle or limit reasonably to counter and rebalance, but there should be no reason to just shut it down. obviously only verifyable real abuse is a different scenario but this again should allow for more detailed information about the cause.

TigersWay · August 2021

Thank you for your recommendation @willie @uptime and @vyas . These are some tools I used or tried to use in the past but the agent from Hetrix Tools have been for me a better way to deal with these type of troubles... So far
And you are right @Falzo that's the question now!

yoursunny · August 2021

@Falzo said:
if it is shared ressources the hostnode can throttle or limit reasonably to counter and rebalance, but there should be no reason to just shut it down.

Almost all providers have the ban hammer hanging on top of you when it comes to high CPU usage on a shared core.
The few exceptions are major cloud providers, BuyVM, and Evolution Host.

I'd rather to be permanently throttled to whatever percentage is acceptable, than having to set CPU limit inside the VM and still live in constant fear that some process may escape the limit.

Yesterday some bots made thousands of connections to my website and the TLS server was using 100% CPU; fair share of this server is 50% CPU.
I discovered it quickly enough and banned the bots.
If I was asleep, the server would be suspended.

Falzo · August 2021

@yoursunny said: Almost all providers have the ban hammer hanging on top of you when it comes to high CPU usage on a shared core.

nah, that's to 90'ish and I like to think that nowadays not so many providers do such crap anymore. won't happen with hetzner, netcup, ultravps, hosthatch and a lot of others anymore. and for sure not with close to idle vservers that occasionally see a spike for whatever reason.

Providers moving responsibilities like that to the customers are a no-go. as you said, you can always be hit by some random thing from the outside, that you can't control instantly. if that leads to a suspension because of 'high usage' that'd be plain BS.

InceptionHosting · August 2021

check you have virtio drivers for network and disk it could be emulation overhead which is not seen inside the guest OS,

TigersWay · August 2021

@InceptionHosting said:
check you have virtio drivers for network and disk it could be emulation overhead which is not seen inside the guest OS,

Of course, I do! Both of them... But I guess you were expecting that
That would maybe explain why I did not see anything.

But a professional "hoster" should know that and not blame his customers....

InceptionHosting · August 2021

@TigersWay said:

@InceptionHosting said:
check you have virtio drivers for network and disk it could be emulation overhead which is not seen inside the guest OS,

Of course, I do! Both of them... But I guess you were expecting that
That would maybe explain why I did not see anything.

But a professional "hoster" should know that and not blame his customers....

Well it is an unmanaged service I would guess? If so you should know enough to manage the server yourself efficiently, part of that is not using 1989 drivers with massive emulation overhead. If you choose to use them who else fault could it be?

But as long as you are using virtio that is not the issue I guess, just a suggestion.

TigersWay · August 2021

@InceptionHosting said:
....
But as long as you are using virtio that is not the issue I guess, just a suggestion.

There's always much to learn, and I am thankful I can most of the time find it here.
Now I need to make it clearer to one specific provider

"CPU Abuse": Understanding & follow-ups

Comments