Autogenerated Prometheus configs for multiple NixOS hosts

I run NixOS on my machines whenever possible. I also monitor them with Prometheus. Currently, I have 2 machines running Prometheus: sambirano, a Raspberry Pi 4 in my apartment, and boraha, a free Oracle Cloud VM (see here for details on how I installed NixOS there). They also run Grafana and Alertmanager, which forwards alerts to Zenduty.

Each machine runs at least the node-exporter and the smartctl-exporter (although this last one doesn’t report on all my disks, something I’ll have to fix in the future). Some also run other exporters, e.g. sambirano monitors my Asus RT-AC88U router using the SNMP exporter). Prometheus needs to know about all of these exporters on all of these machines somehow, and I don’t want to keep configs in sync manually, so I decided to autogenerate the Prometheus targets with Nix.

All of my NixOS host configs are included in one top-level flake, so I don’t have to deal with extra imports anywyere: the self input to NixOS modules contains a self.nixosConfigurations attrset with the config for each host. So I can simply do:

  allHosts = self.nixosConfigurations;

Previously, before using Flakes, I had to do this:

  # This is how nixos configs are evaluated at the top-level. We
  # evaluate the config for all hosts here.
  eval = import <nixos/nixpkgs/nixos/lib/eval-config.nix>;
  allHosts = mapAttrs (k: v: eval { modules = [ v ]; }) hosts.hosts;

Another useful thing is that nearly all of my machines are configured to join a Tailscale tailnet, with Magic DNS enabled. This lets me access machines just by their hostname, without having to deal with IP addressing or network reachability. But since some machines are not in the tailnet, I filter them out:

  hostInTailnetF = k: v: v.config.my.roles.tailscale.enable;
  tailnetHosts = filterAttrs hostInTailnetF allHosts;

We then need to list, for each host, the Prometheus exporters that are enabled. So we map over all hosts (using mapAttrs since we’re dealing with attrsets) using a function that filters enabled exporters. We end up with a map of the form { hostname = { exportername = exporterconfig } }.

  enabledExportersF = name: host: filterAttrs (k: v: isAttrs v && v.enable == true) host.config.services.prometheus.exporters;
  enabledExporters = mapAttrs enabledExportersF tailnetHosts;

We can now trivially transform each exporter config into a prometheus scrape config. We chose to have a separate scrape config for each (host,exporter) tuple for simplicity, and we use relabeling to set the job and instance label appropriately.

  mkScrapeConfigExporter = hostname: ename: ecfg: {
    job_name = "${hostname}-${ename}";
    static_configs = [{ targets = [ "${hostname}:${toString ecfg.port}" ]; }];
    relabel_configs = [
      {
        target_label = "instance";
        replacement = "${hostname}";
      }
      {
        target_label = "job";
        replacement = "${ename}";
      }
    ];
  };

We can apply this transformer function to our previous attrset to obtain a similar-shaped { hostname = { exportername = scrapeconfig } } attrset:

  mkScrapeConfigHost = name: exporters: mapAttrs (mkScrapeConfigExporter name) exporters;
  scrapeConfigsByHost = mapAttrs mkScrapeConfigHost enabledExporters;

Finally, we can take only the scrape configs, flatten them into a map, and use this as our services.prometheus.scrapeConfig:

  autogenScrapeConfigs = flatten (map attrValues (attrValues scrapeConfigsByHost));

I also append a few scrape configs manually to this:

SNMP is manually configured, since the exporter must be queried in a peculiar way to tell it what to report on each scrape.
Prometheus instances cross-monitor each-other. I hardcode these as I don’t want a bug in my Nix autogen code to break my meta-monitoring.

Edited 2024-05-12: thanks to feedback from James Lound, edited the code to be compatible with changes that happened in nixpkgs since I wrote this.