On our devices we do a lot of active monitoring with ping probes. We have more than 3000 linux hosts that use ping probes for:
- Measuring the availability of our hosts, as seen from our infrastructure ("sang")
- Measuring the performance of VPN tunnels ("tmon")
- Measuring the availability of Internet links ("linkmon")
- Detecting that a link is down for a link failover ("link-failover")
- Choosing a download server for signature updates ("clientha")
Doing ping probes is not trivial, because you usually need to do the probing asynchronously. Instead of reimplementing this every time, we have developed a common component that can take care of executing ping probes and collecting the results. We called it Pingmachine.
It is heavily inspired by Smokeping by Tobi Oetiker, but designed to be used as a monitoring component that runs in the background (whereas Smokeping is a complete application).
The basic idea is that any monitoring component that needs some ping probes, just needs to define "ping orders" by writing YAML-formatted files, and wait for results for that order to appear under that order. It's a very simple API that simplified this part of our monitoring applications greatly.
Having a single component taking care of the probing, also makes sure that we can better control what probing is being done. Also, it makes probing issues easier to debug. There is a "pingmachine-status" command-line program that reports all ping orders that are currently active, along with their results.
Most of the probing is executed by fping, whose maintenance is also sponsored by Open Systems. Other probes can be easily defined to implement measurements not based on ICMP ECHO, but on anything else (SSH connections, for example, or HTTP requests).
Pingmachine is self-contained and doesn't depend on any other software written by us. We have published it on Github: https://github.com/open-ch/pingmachine/