Systemd Service Failure Notification System

Published on 23.05.2022
Published in monitoring nixops nixos systemd

I run a small root server with my website, a Nextcloud instance and some other stuff for my personal use. This server runs NixOS and I deploy new versions using NixOps, a NixOS deployment tool. This setup runs smoothly and is very easy and time efficient to maintain.

However, sometimes things do break and even the rollback features of NixOS do not help if you notice the error too late. As this is mostly me, who uses the services provided by the server, it is only me who is affected. I want to know as early as possible if something is broken. This allows me to plan the fix, instead of fixing it in a hurry when I need to use a service.

I looked into various monitoring tools. However, most of them are complex tools that are designed for complex systems. Not for a single server with a single regular user. Therefore, I decided to build my own monitoring tool by leveraging the powers of Nix.

The idea is simple. Everything that is worth to be monitored on my server runs as a systemd service. The only thing I need to do is to send a notification once a service fails, and I need to do this for every service.

I do this by creating a new systemd service called email@. This service does nothing but sending a notification with the information of the associated service. It also should not use itself in case of a failure, as this would probably diverge.

In case you are confused by the @ in the service name. The @ allows you to pass instance parameters into the systemd unit. Those instance parameters can be referred using %i. If you want to read more about it, take a look into the systemd.unit manpage.

In my configuration, this service is defined as

  config = {
    systemd.services."email@" = {
      description = "Sends a status mail via sendmail on service failures.";
      onFailure = mkForce [ ];
      serviceConfig = {
        ExecStart = "${sendmail} ${config.systemd.email-notify.mailTo} %i";
        Type = "oneshot";
      };
    };
  };

where sendmail is defined as

  sendmail = pkgs.writeScript "sendmail"
    ''
      #!/bin/sh

      ${pkgs.system-sendmail}/bin/sendmail -t <<ERRMAIL
      To: $1
      From: ${config.systemd.email-notify.mailFrom}
      Subject: Status of service $2
      Content-Transfer-Encoding: 8bit
      Content-Type: text/plain; charset=UTF-8

      $(systemctl status --full "$2")
      ERRMAIL
    '';

This should be pretty straight forward. This service is a oneshot service that just executes the sendmail script and makes sure nothing is done when it fails (mkForce).

Now we need the other part as well: Attaching the email@ sevice to every systemd service. At least, to every systemd service that is managed by NixOS.

This is done with the following code snipped

 {
	systemd.services = mkOption {
      type = with types; attrsOf (
        submodule {
          config.onFailure = [ "email@%n.service" ];
        }
      );
    };
  };

Here, we append email@%n.service to every service onFailure attribute.

I use this setup for almost three years now, and I am really happy about it. There is potential for improvement (other channels than email, stop notification after many failures of a service in a row, …) but so far this is all I need, and it never let me down.

You can find the full source code of the notification system at github. If you are interested how my server configuration looks like, take a look at my config repository.

PS: In addition to this monitoring on systemd service level, I also use UptimeRobot to make sure all public facing services are available via the internet.

Leave a Reply

Comments

Kommentare für diesen Eintrag als RSS Feed
No comments