"CPU Abuse": Understanding & follow-ups
Hi,
I've just been hit by a "CPU Abuse" warning... And I would love to understand what happened to prepare everything into avoiding that message ever!
One of my box (KVM, 2 cores, if it matters), basically idling for nearly 2 years (1 simple instance of Syncthing + no-traffic websites), monitored with the HetrixTools agent (The Best!) has just been stopped because "constantly consuming 100-200% ( fully utilized 1-2 cores )".
1> Graphs from HT show nothing,
2> my personal checkup shows nothing either: ufw & failban clean, nothing in any log...
While I can find my way around, I'm not any kind of expert with linux: Debian 10 for that one. And I would appreciate any kind of lead, track, where-to-look stuff
Thanks for any light!
Tagged:
Comments
does the provider name start with
?
I bench YABS 24/7/365 unless it's a leap year.
?
No it does not
In fact, until yesterday (and it's probably only my fault) I was very pleased with that still unnamed provider!
well i had a similar issue with my provider, reinstalled a couple times fresh up with fail2ban as well, monitoring htop/top with no indications whatsoever (load constant at <0.10). a couple of tickets back and forth later, no longer received any automated warnings ever since, and my use case hasn't ever changed.
hopefully it works out for you!
I bench YABS 24/7/365 unless it's a leap year.
I feel like the culprit is us. Idk but spit balling.
Anything within the control panel usage history or high load issue?
Nexus Bytes Ryzen Powered NVMe VPS | NYC|Miami|LA|London|Netherlands| Singapore|Tokyo
Storage VPS | LiteSpeed Powered Web Hosting + SSH access | Switcher Special |
Bad feeling at first, right? Their support seems always so sure.... I should believe them!
When the CPU suddenly gets pinned and you weren't doing much, it might be a software bug. I've been involved in a couple of situations where this has happened. Once when I was testing something for a good programmer. These bugs can be hard to catch and not at all related to anything specific you are doing beyond running the buggy program.
Maybe something as simple as asking the provider to turn the VPS on again and looking at the output of
top
might enable you to identify the bad process.Maybe ask the provider to take a look inside or let you take a look before before turning the VPS off if it happens again?
Hope you get it solved! Best wishes from Mexico! 👍
I hope everyone gets the servers they want!
Have only network and HDD graphs. Not really useful. But you're right, I did not think about theirs
You're so right. I thought they could even give a top just before they turned it off.
But I did restart, and of course nothing so far.
As much wishes from Thailand!
Actually just realized, we don't turn off. Just throttle. NVM not me.
Nexus Bytes Ryzen Powered NVMe VPS | NYC|Miami|LA|London|Netherlands| Singapore|Tokyo
Storage VPS | LiteSpeed Powered Web Hosting + SSH access | Switcher Special |
Can they easily get in? Maybe you could add their ssh key?
Providers are hesitant to go inside because of respect for privacy. And if anything happens after they go inside they might be accused of causing the trouble. So we need to remember to give the providers a break, but maybe they'll help you if you ask.
I hope everyone gets the servers they want!
Oh, sorry, I did not realized!
Yes, I'm one of your family member, but no, I have NO trouble with you
For all readers, I usually have no problem to shame names, but in that case, I'm just trying to understand - and not reproduce - than to shame. So no name!
I don't think I would have any trouble to do that of course..... But I have to be totally honest - even if I did know how it is to be "online support" - their lines clearly put the guilt on my side, or at least I got that feeling
well i was just as sure too. with the vps being despicably cheap i really dislike bothering and wasting support time as well as mine, since its kind of an install and forget idler. ticket ended with "answered" , calm ever since. as its not the first VM from this provider to get automated warning, will avoid them in future due to the higher than normal levels of automated communication, and a rather long turnaround time for human interaction. not worth anybody's time for that
I bench YABS 24/7/365 unless it's a leap year.
LOL You're probably right! But I have been very happy with them for 2 years, And I love to put my finger(s) where it hurts Somehow the best way for me not to re-produce.
Use sar to look at the cpu history. You may have to enable some logging for that--see the man page.
The atop monitor can be useful for logging system resource usage
Pretty simple setup on debian:
HS4LIFE (+ (* 3 4) (* 5 6))
That’s the command which @uptime suggested to me many, many moons ago on a different matter
blog | exploring visually |
If you server were using HDD, then maybe high IO wait cause this..
⭕ A simple uptime dashboard using UptimeRobot API https://upy.duo.ovh
⭕ Currently using VPS from BuyVM, GreenCloudVPS, Gullo's, Hetzner, HostHatch, InceptionHosting, LetBox, MaxKVM, MrVM, VirMach.
switch providers. easy as that.
if shutting down/suspending is the only way a provider is able to deal with high CPU usage and they can't even provide more detailed information about what/when/how long, then that's simply BS.
a balanced system should always allow you to utilize 100% of the ressources you booked. if it is shared ressources the hostnode can throttle or limit reasonably to counter and rebalance, but there should be no reason to just shut it down. obviously only verifyable real abuse is a different scenario but this again should allow for more detailed information about the cause.
Thank you for your recommendation @willie @uptime and @vyas . These are some tools I used or tried to use in the past but the agent from Hetrix Tools have been for me a better way to deal with these type of troubles... So far
And you are right @Falzo that's the question now!
Almost all providers have the ban hammer hanging on top of you when it comes to high CPU usage on a shared core.
The few exceptions are major cloud providers, BuyVM, and Evolution Host.
I'd rather to be permanently throttled to whatever percentage is acceptable, than having to set CPU limit inside the VM and still live in constant fear that some process may escape the limit.
Yesterday some bots made thousands of connections to my website and the TLS server was using 100% CPU; fair share of this server is 50% CPU.
I discovered it quickly enough and banned the bots.
If I was asleep, the server would be suspended.
Accepting submissions for IPv6 less than /64 Hall of Incompetence.
nah, that's to 90'ish and I like to think that nowadays not so many providers do such crap anymore. won't happen with hetzner, netcup, ultravps, hosthatch and a lot of others anymore. and for sure not with close to idle vservers that occasionally see a spike for whatever reason.
Providers moving responsibilities like that to the customers are a no-go. as you said, you can always be hit by some random thing from the outside, that you can't control instantly. if that leads to a suspension because of 'high usage' that'd be plain BS.
check you have virtio drivers for network and disk it could be emulation overhead which is not seen inside the guest OS,
https://inceptionhosting.com
Please do not use the PM system here for Inception Hosting support issues.
Of course, I do! Both of them... But I guess you were expecting that
That would maybe explain why I did not see anything.
But a professional "hoster" should know that and not blame his customers....
Well it is an unmanaged service I would guess? If so you should know enough to manage the server yourself efficiently, part of that is not using 1989 drivers with massive emulation overhead. If you choose to use them who else fault could it be?
But as long as you are using virtio that is not the issue I guess, just a suggestion.
https://inceptionhosting.com
Please do not use the PM system here for Inception Hosting support issues.
There's always much to learn, and I am thankful I can most of the time find it here.
Now I need to make it clearer to one specific provider