For my ETBE-Mon [1] monitoring system I recently added a monitor for the Linux load average. The Unix load average isn’t a very good metric for monitoring system load, but it’s well known and easy to use. I’ve previously written about the Linux load average and how it’s apparently different from other Unix like OSs [2]. The monitor is still named loadavg but I’ve now made it also monitor on the usage of memory because excessive memory use and load average are often correlated.
For issues that might be transient it’s good to have a monitoring system give a reasonable amount of information about the problem so it can be diagnosed later on. So when the load average monitor gives an alert I have it display a list of D state processes (if any), a list of the top 10 processes using the most CPU time if they are using more than 5%, and a list of the top 10 processes using the most RAM if they are using more than 2% total virtual memory.
For documenting the output of the free(1) command (or /proc/meminfo when writing a program to do it) the best page I found was this StackExchange page [3]. So I compare MemAvailable+SwapFree to MemTotal+SwapTotal to determine the percentage of virtual memory used.
Any suggestions on how I could improve this?
The code is in the recent releases of etbemon, it’s in Debian/Unstable, on the project page on my site, and here’s a link to the loadave.monitor script in the Debian Salsa Git repository [4].
Why not using PSI? https://facebookmicrosites.github.io/psi/docs/overview.html
How some other do it https://github.com/shirou/gopsutil/blob/33820ab93048c0de8edf2a8d72c102badc508424/mem/mem_linux.go#L240
Thanks for the comments. PSI is interesting and it’s enabled by default in Debian kernels in Unstable and Testing but not in Buster. Most of my development is aimed at Debian so PSI is something I need to target. The github code is interesting, I’ll read that too.
https://raymii.org/s/
Remy posted a link to this post on lobste.rs and many people read it from there. Thanks Remy!