How do you guys monitor (failed-) cron jobs?

How do you guys monitor cron jobs or failed cron jobs?
We're managing +100 servers and have been outputting everything to a log and then monitored to logs to catch the failing jobs.

But surely there has to be an easier way. How do you handle this?

Talistech.com — ICT Consultancy and NVMe web hosting solutions.

Comments

  • If a cron job produces non-empty output, the output is mailed to the Unix account invoking the job.
    You can script the job to only produce output when it fails, and then you can read your mail to know about failed jobs.
    Sysadmin can configured Unix mail to forward to email too.

    Thanked by (2)Talistech dfroe

    ServerFactory aff best VPS; HostBrr aff best storage.

  • edited July 2023

    @yoursunny said:
    If a cron job produces non-empty output, the output is mailed to the Unix account invoking the job.
    You can script the job to only produce output when it fails, and then you can read your mail to know about failed jobs.
    Sysadmin can configured Unix mail to forward to email too.

    Yeah this is also one of the ways how we are doing it. Also not all servers have a postfix or another mailserver running.
    2>&1 | mail

    I was hoping there was some kind of tool to make everything easier with a dashboard or something. Just looking around :)

    Talistech.com — ICT Consultancy and NVMe web hosting solutions.

  • If you're looking for something managed, Healthchecks.io might be a good option.

    Thanked by (3)Talistech bdl linveo
  • @Talistech said:
    How do you guys monitor cron jobs or failed cron jobs?
    We're managing +100 servers and have been outputting everything to a log and then monitored to logs to catch the failing jobs.

    But surely there has to be an easier way. How do you handle this?

    It depends on what the jobs actually do. To just monitor logs can be a bit hazardous, what if the job stops and does not output any logs? Are you monitoring for that as well?

    I have cron jobs that actually monitor other cron jobs. In some scripts I run it does something simple when completed, like touching a file. I then have other jobs that monitor the timestamp of this file. If it has not changed in a while, I know the job is not running.

    Actually I have found Icinga to be a pretty good solution for things like this. It can have "passive checks" which means that it has a service that expects something else to update it within a specified intervall. If no update is made, Icinga alerts. You get the many advantages of Icinga such as automation, templates, groups, different ways of alerting, integration and monitoring and whatever else, and it is still just one line in your scripts to trigger all of this.
    Of course, you can let Icinga monitor the actual results of the cron jobs as well. For example, I have some cron jobs that runs a backup. I monitor that by telling Icinga to check the backup archive and see if there is a recent backup there.

    You can of course replace Icinga with Nagios or CheckMK or whatever. Chose your own poison. :smile:

    Thanked by (2)skorous sh97
  • @rcy026 said:

    @Talistech said:
    How do you guys monitor cron jobs or failed cron jobs?
    We're managing +100 servers and have been outputting everything to a log and then monitored to logs to catch the failing jobs.

    But surely there has to be an easier way. How do you handle this?

    It depends on what the jobs actually do. To just monitor logs can be a bit hazardous, what if the job stops and does not output any logs? Are you monitoring for that as well?

    I have cron jobs that actually monitor other cron jobs. In some scripts I run it does something simple when completed, like touching a file. I then have other jobs that monitor the timestamp of this file. If it has not changed in a while, I know the job is not running.

    Actually I have found Icinga to be a pretty good solution for things like this. It can have "passive checks" which means that it has a service that expects something else to update it within a specified intervall. If no update is made, Icinga alerts. You get the many advantages of Icinga such as automation, templates, groups, different ways of alerting, integration and monitoring and whatever else, and it is still just one line in your scripts to trigger all of this.
    Of course, you can let Icinga monitor the actual results of the cron jobs as well. For example, I have some cron jobs that runs a backup. I monitor that by telling Icinga to check the backup archive and see if there is a recent backup there.

    You can of course replace Icinga with Nagios or CheckMK or whatever. Chose your own poison. :smile:

    Thank you for your feedback! We are currently using Zabbix for similar checks aswell but I do not want to touch a file everytime a cron job runs sucessfully. Zabbix has a builtin zabbix-sender action, which can send to zabbix if a job has succeeded or failed but this forces me to edit all of my cronjobs to allow such an output. But as you said, it's a way of monitoring aswell.
    Thank you for your input!

    Talistech.com — ICT Consultancy and NVMe web hosting solutions.

  • @berkay said:
    If you're looking for something managed, Healthchecks.io might be a good option.

    I doubt it'll have what we need but I'll check it, thanks!

    Talistech.com — ICT Consultancy and NVMe web hosting solutions.

  • @Talistech said:

    @berkay said:
    If you're looking for something managed, Healthchecks.io might be a good option.

    I doubt it'll have what we need but I'll check it, thanks!

    Why do you doubt it exactly? :smile:

  • @rcy026 said:
    I have cron jobs that actually monitor other cron jobs. In some scripts I run it does something simple when completed, like touching a file.

    Ditto, but I always try and also leverage the mailto facility in cron, so you get to know if one of the wheels is coming loose, deprecation warnings in a package say, even if the job is superficially doing all you're testing for.

    Of course you then need yet another heartbeat mechanism to check the mail subsystems are working =)

  • @cochon said:

    @rcy026 said:
    I have cron jobs that actually monitor other cron jobs. In some scripts I run it does something simple when completed, like touching a file.

    Ditto, but I always try and also leverage the mailto facility in cron, so you get to know if one of the wheels is coming loose, deprecation warnings in a package say, even if the job is superficially doing all you're testing for.

    Of course you then need yet another heartbeat mechanism to check the mail subsystems are working =)

    That's one of the things I like about letting Icinga handle the alerting. I can get the notifications via mail, sms, pushover, on a website, on the tv in the noc, it can call the office receptionist or even blink the lights in the office. Or any combination thereof.
    The possibilities are endless. :smile:

  • @Talistech said:

    @berkay said:
    If you're looking for something managed, Healthchecks.io might be a good option.

    I doubt it'll have what we need but I'll check it, thanks!

    You can do the same in free UptimeKuma. It is named "Push monitor". You are getting http link and for example if that link has not been called once an hour, then the alarm is raised. You can add a second part to the cron command that will be invoked after the correct execution of the command.

    Example Cron command: 0 * * * * main_command && curl 'https://uptimekuma.local/api/push/abcdef12345?status=up&msg=OK&ping=' That command will ping UptimeKuma url every time when first command not returned any errors.

    Thanked by (2)FrankZ Talistech
  • @Hertonis said:
    You can do the same in free UptimeKuma. It is named "Push monitor". You are getting http link and for example if that link has not been called once an hour, then the alarm is raised. You can add a second part to the cron command that will be invoked after the correct execution of the command.

    Example Cron command: 0 * * * * main_command && curl 'https://uptimekuma.local/api/push/abcdef12345?status=up&msg=OK&ping=' That command will ping UptimeKuma url every time when first command not returned any errors.

    That is actually pretty clever, cant understand why I have never thought of this. :smile:
    I use Icinga for similar functionality but I always put the call for Icinga inside the script that cron executes. It gives a little more flexibility since I can send different error codes and messages to Icinga based on what actually happened and include them in the alert, but your way is actually perfect if you just want to make sure that something has been executed.
    I will definitely use this somewhere, thanks!

  • @Talistech said:

    @berkay said:
    If you're looking for something managed, Healthchecks.io might be a good option.

    I doubt it'll have what we need but I'll check it, thanks!

    Worth checking out! Look at the wrappers: https://healthchecks.io/docs/resources/
    And it's an open source project worth supporting! :)

    Thanked by (1)Talistech
  • @rcy026 said:

    @Hertonis said:
    You can do the same in free UptimeKuma. It is named "Push monitor". You are getting http link and for example if that link has not been called once an hour, then the alarm is raised. You can add a second part to the cron command that will be invoked after the correct execution of the command.

    Example Cron command: 0 * * * * main_command && curl 'https://uptimekuma.local/api/push/abcdef12345?status=up&msg=OK&ping=' That command will ping UptimeKuma url every time when first command not returned any errors.

    That is actually pretty clever, cant understand why I have never thought of this. :smile:
    I use Icinga for similar functionality but I always put the call for Icinga inside the script that cron executes. It gives a little more flexibility since I can send different error codes and messages to Icinga based on what actually happened and include them in the alert, but your way is actually perfect if you just want to make sure that something has been executed.
    I will definitely use this somewhere, thanks!

    If you prefer having some logs or status from finished command you can use healthchecks.io, selfhosted version is also free. But still I think that Uptime Kuma is more popular and also more versatile.

    Thanked by (1)Talistech
  • This requires a high level of expertise, and unfortunately, I can't be of much help in this regard.

  • try cronitor.io

  • Thank you for all the replies, I have a couple things to check out and experiment now!

    Talistech.com — ICT Consultancy and NVMe web hosting solutions.

  • @Hertonis said:

    @rcy026 said:

    @Hertonis said:
    You can do the same in free UptimeKuma. It is named "Push monitor". You are getting http link and for example if that link has not been called once an hour, then the alarm is raised. You can add a second part to the cron command that will be invoked after the correct execution of the command.

    Example Cron command: 0 * * * * main_command && curl 'https://uptimekuma.local/api/push/abcdef12345?status=up&msg=OK&ping=' That command will ping UptimeKuma url every time when first command not returned any errors.

    That is actually pretty clever, cant understand why I have never thought of this. :smile:
    I use Icinga for similar functionality but I always put the call for Icinga inside the script that cron executes. It gives a little more flexibility since I can send different error codes and messages to Icinga based on what actually happened and include them in the alert, but your way is actually perfect if you just want to make sure that something has been executed.
    I will definitely use this somewhere, thanks!

    If you prefer having some logs or status from finished command you can use healthchecks.io, selfhosted version is also free. But still I think that Uptime Kuma is more popular and also more versatile.

    I can do all of that and more with Icinga or Nagios, but thanks.

Sign In or Register to comment.