Mardi
Latest release: 0.1-beta1 Download the latest release.
Warning: I am no longer using or maintaining this software. It may not work on current operating systems. If you are using it or would like to use it, please contact me.
Overview
Mardi is a tool for tracking the values of system variables, logging them such that they can be graphed with Gnuplot, and sending alerts when any variable exceeds an upper or lower limit. Any variable that can be accessed by a shell command can be monitored, and any action that can be expressed as a shell command can be taken as an alarm.
Mardi 0.1 has only been tested on Linux (specifically, Fedora Core 3), but should work with little to no modification on other *nix systems. It can only monitor positive integer variables; real number (floating point) support may be added in a later version. For the moment, real numbers can be monitored by converting them to integers (by multiplying by a constant and truncating least significant digits).
Command-line usage
$ mardi [-c <config file>] [-D] [-h] [-v]
where
- -c <config file> - use the specified configuration file instead of the default
- -D - run the program as a daemon
- -h - print this message
- -v - print version information
The default configuration file is /etc/mardi.conf.
Configuration file format
Configuration is stored in an XML document with doctype "mardi". The configuration file must have a root element of type mardi, which may have the following attributes:
- interval - the time in seconds between updating trackers, as a decimal integer. Default is 60.
- logfile - the name of the file to log tracker data to. Default is /var/log/mardi/mardi.log.
- over - the default command to execute when a value exceeds its upper limit. Required.
- under - the default command to execute when a value exceeds its lower limit. Required.
- user - the name of the default user to run commands. Default is nobody.
- group - the name of the default group to run commands. Default is nobody.
The root element must contain one or more elements of type tracker, which represent values being tracked. tracker elements have the following attributes:
- name - a string identifying the value being tracked. Required.
- min - the integer lower bound on the value. Default is 0 (disabled).
- max - the integer upper bound on the value. Default is 0 (disabled).
- diff - a boolean, true to track differences between measurements, or false to track the measurements themselves. Default is "false".
- over - the command to execute when the value exceeds its upper limit. Default is over from the root element.
- under - the command to execute when the value exceeds its lower limit. Default is under from the root element.
- user - the name of the user to run commands for this tracker. Default is user from the root element.
- group - the name of the group to run commands for this tracker. Default is group from the root element.
As content, each tracker element must contain a command to execute to retrieve the value being tracked.
If min or max is not set, or is set to zero, on a tracker, then no checking will be done against that value. If neither are set (or both are set to zero), then the value will be logged, but no alerts will be sent.
Example configuration file
<?xml version="1.0" encoding="UTF-8"?> <!-- Sample configuration file for Mardi --> <mardi interval="60" logfile="track.dat" over="echo $name over limit $max: $val" under="echo $name under limit $min: $val" > <!-- Track the number of bytes received on eth0 --> <!-- when more than 10 MB are send in one minute, send mail to root --> <tracker name="Byte count" diff="true" max="10000000" over="echo -e "Current value: $val\nLimit: $max\n" | mail -s " $name exceeded" root" > awk 'BEGIN {FS="[ \t:]*"} {if ($$2 == "eth0") {printf("%s", $$3)}}' /proc/net/dev </tracker> <!-- Track user degraaf's disk usage --> <!-- if disk usage is over 10 GB, send a message to syslog --> <tracker name="deGraaf's disk usage" max="1000000" user="degraaf" over="logger "degraaf's home directory is getting full: $val kB currently used"" > du -sk ~ | cut -f 1 </tracker> <!-- Track memory usage --> <!-- if there is less than 10 MB of free memory, kill the process with the highest memory usage --> <tracker name="Memory usage" min="10000" max="0" user="root" under="kill -9 `ps -e -o "pid vsz" --sort=-vsz --no-headers | head -n 1 | cut -f 2 -d " "`" > grep "^MemFree" /proc/meminfo | awk '{print $$2}' </tracker> </mardi>
In this configuration, all trackers are updated ever 60 seconds, data is logged to track.dat in the current directory, and the default actions tot take when a value exceeds its bounds are to print messages to the screen. Note that this will not work in daemon mode. By default, all commands are run as user and group "nobody".
The first tracker, "Byte count", logs the number of bytes received on the eth0 network interface; this data is read from /proc/net/dev. If more than 10 MB are received in any one minute interval, then email is sent to root.
The second tracker, "deGraaf's disk usage", tracks the size of user "degraaf"'s home directory. If it exceeds 10 GB, then a warning message is written to syslog. All commands are run as the user "degraaf".
The third tracker, "Memory usage", monitors the amount of free memory, as reported in /proc/meminfo. If the total free memory ever drops below 10 MB, then the process with the highest memory usage is killed.
You can download the sample configuration here.
Commands
All commands must be able to run in a restricted Bourne shell (/bin/sh -r). Be careful not to use commands which can block for any significant amount of time, especially if a short update interval is used.
The following macros are expanded at runtime in all commands:
- $name - the name of the variable
- $min - the lower bound on the variable
- $max - the upper bound on the variable
- $val - the current value of the variable (not used when retrieving variable values)
- $time - the current time and date
- $$ - the character '$'
Thus, the command
echo "current value: $val" | mail -s "$name is over limit" root@mydomail.zz
might expand to
echo "current value: 100" | mail -s "Memory usage is over limit" root@mydomail.zz
If Mardi is run with superuser privileges, then by default, all commands are run as user and group nobody. Otherwise, they are run as the current user. If a user and group are specified for a tracker, then this identity is used instead of nobody. To run a command requiring superuser privileges, set the user to root. Note that it may be a security risk to do so. Do not set the default user to root.
Log file
The log file consists of a set of records in tab-separated columns. Each record contains a timestamp and the values of all variables being tracked at that time. The first column contains the timestamp, as the number of seconds since the epoch (00:00:00 UTC, Jan 1st, 1970). Subsequent columns contain the values of each variable being monitored, in the order that the variables are given in the configuration file.
For instance, a portion of a log file generated using the sample configuration above might be:
1122412557 1019 15388353 32024 1122412557 67884 15388353 29916 1122412557 208354 15388353 27916 1122415970 212621 15388373 46284 1122415970 159082 15388373 46284 1122415970 269273 15388373 46284 1122415970 196967 15388373 46268 1122415970 299984 15388373 46284
The results of the first tracker (bytes received on eth0) could be graphed using a command such as the following:
echo "set term postscript color set output \"graph.ps\" set ylabel \"Number of bytes received\" set xlabel \"Time of day (seconds since the Epoch)\" set title \"Number of bytes received on eth0 per minute over a 30 minute period\" plot \"track.dat\" using 1:2 title \"Bytes received on eth0\" with linespoints" | gnuplot
to produce a graph similar to this:
Mardi does not necessarily write all logged data out to disk as soon as it is recorded; it may be cached in memory for some time. To force Mardi to flush all cached data out to disk without stopping it, send it a SIGHUP signal. This can be accomplished with a command such as killall -HUP mardi.
Error reporting
In normal (console) mode, Mardi prints all run-time error messages to the console. In daemon mode, run-time errors are reported to syslog, under the LOG_USER facility. If a command fails (exits with a status other than 0), then its exit status and any output produced on stdout will be reported.
Stopping Mardi
To stop Mardi, send it a SIGINT signal. In console mode, this can be done by entering ctrl-c. If mardi is running in daemon mode, then this can be accomplished with a command such as killall -INT mardi.
Compilation and installation
See the file INSTALL in the top-level directory of the source tarball.
Name
In case you're curious about the name, "Mardi": I started with the name "Monitoring, Alerting, and Recording Daemon". This is a little unwieldy, so I abbreviated it to the acronym "MARD". If I left the name as that, people would most likely pronounce it to rhyme with "lard", which would be unfortunate. The intended pronunciation is "mar-dee", so I added the 'i' to the name to encourage the correct pronunciation.
Also, the initial public release (version 0.1) was made on a Tuesday.
License
Mardi is distributed under the terms of the GNU General Public License (GPL).
Contact information
Should you have questions, comments, or concerns about Mardi, contact me by email at . Please put the word "Mardi" in the subject of your message.
Contributing
If you wish to contribute to Mardi, please contact me and let me know what you wish to do. Or if you want to contribute but don't have anything in particular in mind, contact me and we'll find something suitable to your abilities. Please don't send me patches against anything older than my latest release.
Sorry, but my version control system is not publically accessible at the moment. I use Subversion, which SourceForge doesn't yet support. I'll set up public SVN access whenever SourceForge supports it or I get my own server. If you need a copy of my latest development sources, please contact me.