Modules

Note: the following documentation is for Resmon 1. See Modules for Resmon 2 module documentation

There are a number of modules included with resmon that will cover most things you need to monitor. A list of the modules is below, along with a sample configuration. You can also create your own modules.

Generic configuration options

The following options are applicable to any module:

  • interval : cache the result for n seconds. Useful for long running modules. Default: do not cache
  • check_timeout : a per check timeout value. Go bad with a timeout if the check takes longer than this to run. Overrides the global timeout value.

A1000

This module monitors the health of an A1000 Storedge disk array.

Sample Configuration

A1000 {
    fa000_001 : status => Optimal
}

Arguments

  • Object : the unit you wish to monitor
  • status : the status that you consider to be OK

ADAPTEC

This module monitors the health of an Adaptec RAID controller. It requires the arcconf command line utility that comes with Adaptec Storage Manager.

Sample Configuration

ADAPTEC {
    1 : noop
}

Arguments

  • Object : the controller you wish to monitor
  • arcconf : (optional) the path to the arcconf command line utility. Defaults to /usr/StorMan/arcconf

DATE

A simple module that just prints the current unix timestamp. This can be useful when using the status.txt file to ensure that you have up to date information. However, when using the XML checks, this module is no longer necessary as each check includes information on when it was last updated.

Sample Configuration

DATE {
    date : noop
}

DHCPLEASES

This module checks the amount of active dhcp leases for a network and warns if the amount grows close to the maximum amount of addresses available in the dhcp pool.

Sample Configuration

DHCPLEASES {
    10.0.0                         : warn => 15, crit => 25
    192.168.0                      : warn => 25, crit => 45
}

Arguments

  • Object : the network you wish to check the leases for
  • warn : The amount of leases above which you want to warn
  • crit : The amount of leases above which you want to go critical

DISK

This module checks the amount of free disk space using df.

Sample Configuration

DISK {
    /data1 : limit => 95%, warnat => 70%
    /data2 : limit => 95%, warnat => 70%
    /data3 : limit => 95%
    /data4 : minkbfree => 1048576

Arguments

While all arguments are optional, you should have at least one of limit or minkbfree. Including checks for both percentage used and KB free may have undesirable effects, so you should only include one of these methods.

  • Object : the mount point or device for which you want to check free space
  • limit : (optional) the percentage used above which you want to go critical
  • warnat : (optional) the percentage used above which you want to warn
  • minkbfree : (optional) the minimum amount of free space in KB before going critical
  • warnkbfree: (optional) the amount of free disk space in KB below which you want to warn.

DNS

This module checks the status of the bind dns server.

Sample Configuration

DNS {
    dns : key => /dns/etc/rndc.key
}

Arguments

  • Object : this is just a label used to identify the check
  • key : the path to the dns key used by rdnc

ECCMGR

This module connects with the Ecelerity eccmgr and ensures that it is running.

Sample Configuration

ECCMGR {
    eccmgr : socket => /tmp/2026
}

Arguments

  • Object : this is just a label used to identify the check
  • socket : the path to the socket to connect to eccmgr

Notes

This is one of the checks that requires a special module to connect. This is best achieved by running resmon using the version of perl that comes with ecelerity.

FAULTS

This module checks for any hardware faults using the fmadm command.

Sample Configuration

FAULTS {
    hardware : noop
}

FILEAGE

This module monitors the age of a specific file, going bad if the file is too old or new. If the file is missing, the behavior is configurable.

Sample Configuration

FILEAGE {
    /path/to/file  : minimum => 30, maximum => 3600
    /path/to/file2 : maximum => 7200, allowmissing => yes
}

Arguments

  • Object : the path to the file you wish to monitor
  • minimum : (optional) the minimum age of the file in seconds you consider to be OK
  • maximum : (optional) the maximum age of the file in seconds you consider to be OK
  • allowmissing : (optiona) what to do if the file is missing. If this is yes, then the status is OK, otherwise, the status is bad for a missing file.

FILECOUNT

This module monitors the number of files in a directory, going bad when the file count goes over a threshold.

Sample Configuration

FILECOUNT {
    /path/to/dir : slimit => 10, hlimit => 20
}

Arguments

  • Object : the path to the directory
  • slimit : the 'soft' threshold, above which the module will warn
  • slimit : the 'hard' threshold, above which the module will go critical

FILESIZE

This module monitors the size of a specific file, going bad if it is too big or too small.

Sample Configuration

FILESIZE {
    /path/to/file : minimum => 1, maximum => 16384
}

Arguments

  • Object : the path to the file you want to monitor
  • minimum : the minimum file size, in bytes
  • maximum : the maximum file size, in bytes

FREEMEM

This module monitors the amount of free memory on the system. It is platform specific and currently works with Linux and Solaris. On Solaris, it makes use of the Sun::Solaris::Kstat module if available in order to obtain the ZFS ARC size. If the kstat module is not available, then an alternate method is used where cache values cannot be obtained. In this case, includecache must be set to 0.

Sample Configuration

FREEMEM {
    memory : limit => 512, includecache => 1
}

Arguments

  • Object : this is just a label used to identify the check
  • limit : (optional, default 512) the minimum amount of free memory in MB below which we go critical
  • includecache : (optional, default 0) include cache in the amount of free memory

FRESHSVN

This module checks a subversion checkout to make sure it is up to date and pointing to the correct url. See also the SIMPLESVN module, which doesn't perform as thorough a check, but has fewer requirements and works with older versions of subversion.

Sample Configuration

FRESHSVN {
    /opt/resmon : URL => https://labs.omniti.com/resmon/trunk
}

Arguments

  • Object : Path to the working copy
  • URL : the url that the working copy should be checked out from
  • maxlag : (optional, default 330 seconds) the amount of time you allow for the repository to update before the repository should be considered out of date. It's a good idea to set this to the interval at which your update cron job runs + a few seconds.

INODES

This module monitors the amount of free inodes on a filesystem.

Sample Configuration

INODES {
    /     : limit => 90%
    /data : limit => 90%
}

Arguments

  • Object : The filesystem you wish to monitor
  • limit : the percentage of inodes used after which you want to alarm

LARGEFILES

This module looks for 'large' files in a directory.

Sample Configuration

LARGEFILES {
    /path/to/dir : limit => 16384
}

Arguments

  • Object : the directory you wish to monitor
  • limit : the maximum file size in bytes

LOGFILE

This module monitors a log file, looking for errors. What the module considers an error is configurable.

Sample Configuration

LOGFILE {
    /var/log/mylogfile : max => 4, match => ^ERROR:
}

Arguments

  • Object : path to the log file
  • match : regex that defines what an error is
  • max : (optional, default 8) the maximum amount of errors you will allow before going critical

MDSTAT

This module monitors the status of Linux Software RAID devices.

Sample Configuration

MDSTAT {
  raid : noop
}

Arguments

  • Object : this is just a label used to identify the check. All MD devices are detected automatically.

NETBACKUPTAPE

This module checks the status of tape drives in netbackup, and will go critical if any are down, or there are no drives up.

Sample Configuration

NETBACKUPTAPE {
    tapes : noop
}

NETSTAT

This module checks the output of netstat, as its name suggests. It can be used to ensure that a server is listening on a specified port, or that a certain connection is currently open.

Sample Configuration

NETSTAT {
    ssh : state => LISTEN, localport => 22
    sshtunnel : state => ESTABLISHED, remoteip => 10.0.0.1, remoteport => 22
}

Arguments

  • Object : this is just a label used to identify the check
  • state : the connection state (e.g. LISTEN, ESTABLISHED)
  • localport : (optional) the local port of the connection
  • localip : (optional) the local ip of the connection
  • remoteport : (optional) the remote port of the connection
  • remoteip : (optional) the remote ip of the connection

NEWFILES

This module ensures a directory has files modified later than a certiain time (for example, checking that new logfiles are being generated)

Sample Configuration

NEW {
  /test/dir : minutes => 5, filecount => 2
  /other/dir : minutes => 60
}

Arguments

  • Object : the directory to monitor
  • minutes : how old can a file be before we no longer consider it 'new'
  • filecount : (optional, default 1) how many new files do we require to be new

OLDFILES

This module checks for files in a directory that are older than a certain time.

Sample Configuration

OLDFILES {
  /test/dir : minutes => 5, filecount => 2, checkmount => 1
  /other/dir : minutes => 60
}

Arguments

  • Object : the directory to monitor
  • minutes : how old can the files be before we alarm
  • checkmount : check to make sure the directory is mounted first (only enable if the dir you are checking is the mountpoint of a filesystem)
  • filecount : how many old files will we allow before alarming. If this is not set, then we will alarm if any files are old.

PGREP

This module checks for running processes using the pgrep command.

Sample Configuration

PGREP {
    dhcpd : arg0 => em0, arg1 => em1
}

Arguments

  • Object : the process name you are looking for
  • arg1-arg3: (optional) the arguments that must have been passed to the command

QUEUESIZE

This module checks the size of ecelerity mail queues (delayed or active) and alerts if any are over a certain limit.

Sample Configuration

QUEUESIZE {
  aol.com                               : queue => delayed, count => 3000
  yahoo.com                             : queue => delayed, count => 3000
  msn.com                               : queue => delayed, count => 3000
  hotmail.com                           : queue => delayed, count => 3000
  common                                : queue => delayed, count => 3000
}

Arguments

  • Object : the domain name of the queue you wish to monitor, or 'common'. Picking common will monitor all queues except for major ISPs.
  • queue : which queue to monitor (active or delayed)
  • count : the number of messages allowed in the queue before we alarm

REMOTEFILESIZE

This module checks the size of a remote file using ssh. This requires that passwordless ssh be set up for root from one machine to the other.

Sample Configuration

REMOTEFILESIZE {
    /path/to/file : host => other.example.com, minimum => 1, maximum => 131072
}

Arguments

  • Object : the path to the file to be monitored
  • host : the hostname of the server
  • minimum : the minimum file size in bytes
  • maximum : the maximum file size in bytes

RESMON

This module monitors resmon itself and reports if there is a problem with the config file or if there are any failed modules. This is most useful in conjunction with auto updating, when modules are reloaded without restarting resmon.

Note: at some point, this module may be added by default, but at the moment it needs to be included in the config file.

This check will also report the subversion revision number if resmon is running from a checkout.

Sample Configuration

RESMON {
    resmon : noop
}

SCRIPT

This module runs a perl script and expects some output from the script in the form of "STATUS(message)". This allows resmon to run helper scripts without needing to write a complete module.

Sample Configuration

SCRIPT {
    myscript : script => /path/to/myscript.pl, timeout => 300
}

Arguments

  • Object : This is just a label used to identify the check
  • script : the path to the perl script
  • timeout : (optional, default 30) how long to cache the result of the command for in seconds

SIMPLESVN

This module, like the FRESHSVN module, checks for the health of a subversion checkout, making sure it is up to date and that there are no problems. It does not check that the working copy is checked out from a specific repository, nor does it have any grace period. However, it will work with older versions of subversion and may be preferable to the FRESHSVN module in some circumstances.

Sample Configuration

SIMPLESVN {
    /path/to/working/copy : noop
}

Arguments

  • Object : the path to the working copy

SMFMAINTENANCE

This module checks for any solaris services in maintenance mode.

Sample Configuration

SMFMAINTENANCE {
    services : noop
}

SWAPSIZE

This module monitors the memory used on Solaris by inspecting the usage of the /tmp directory.

Sample Configuration

SWAPSIZE {
    swap : limit => 262144
}

Arguments

  • Object : this is just a label used to identify the check
  • limit : the minimum amount of free memory below which we go critical

TCPSERVICE

This module connects to a tcp service at regular intervals, going critical if the connection fails.

Sample Configuration

TCPSERVICE {
    ssh : host => 127.0.0.1, port => 22, timeout => 2
}

Arguments

  • Object : this is just a label used to identify the check
  • host : the host to connect to
  • port : the port to connect to
  • timeout : how long to wait for a connection before going critical in seconds
  • prepost : (optional) a string to send on connection. Useful if the service you are checking requires something to be entered before showing a banner.

TWRAID

This module monitors the status of a 3ware RAID controller unit. It requires that you have the tw_cli command installed, which is available from http://www.3ware.com/ .

Sample Configuration

TWRAID {
    /c0/u1 : tw_cli => /path/to/tw_cli
}

Arguments

  • Object : the unit you wish to monitor, this should be in the form /cx/ux (/c0/u1 is likely to be correct if only one unit is present)
  • tw_cli : (optional) the path to the tw_cli command. This defaults to /usr/local/bin/tw_cli if not present.

WALCHECK

This module monitors the postgresql log file replay from a master to a slave.

Sample Configuration

WALCHECK {
    check_pg_replay_mode : logdir => /data/postgres/82/pg_log
}

Arguments

  • Object : this is just a label used to identify the check
  • logdir : the location of the logs

ZIMBRA

This module checks zimbra's service status and goes critical if any services are down.

Sample Configuration

ZIMBRA {
    services : noop
}

ZPOOLERRS

This module checks for zpool read write errors by using zpool status -x. It will also notify if a zpool is degraded or not, similar to the basic zpool check.

This check can be used either in combination with the ZPOOL check or instead of it. If used in combination, it is probably a good idea to warn or email when the ZPOOLERRS check goes bad, and page when the ZPOOL check goes bad. If you wish to page on read/write errors as well as degraded arrays, then only the ZPOOLERRS check is required.

Sample Configuration

ZPOOLERRS {
    zpools : noop
}
ZPOOLERRS {
    zpools : warn_on_upgrade => yes
}

Arguments

  • Object : this is just a label used to identify the check
  • warn_on_upgrade : when a zpool needs upgrading to a new zfs version, do we warn or stay OK?

ZPOOLFREE

This module monitors the free space in a zfs pool using the zfs list command (the zpool list command can give misleading results when the zpool is almost full).

Often, it is more informative to use this module rather than the DISK module if your filesystems are all part of a zpool. Otherwise, what happens is that when the disk is full, every filesystem based check goes to 100% full and it isn't obvious what the cause is.

Sample Configuration

ZPOOLFREE {
    pool1 : limit => 90%
    pool2 : limit => 90%
}

Arguments

  • Object : the name of the zpool you wish to monitor
  • limit : how full to get before going critical

ZPOOL

This module looks for degraded zpools, but does not go critical if there are any recoverable errors that do not cause the array to be degraded.

Sample Configuration

ZPOOL {
    zpools : noop
}