Ticket #14 (closed defect: fixed)

Opened 4 years ago

Last modified 4 years ago

Errors caused by running external commands are not properly caught

Reported by: mark Assigned to: mark
Priority: major Milestone: 2.0_release
Component: resmon Version: 2.0
Keywords: Cc:

Description

Any module errors should be caught by an eval and the value of the exception be added to the 'error' metric. This happens if you die() within a module, but not when running a command using run_command fails.

Example:

Run the Core::ZpoolFree? module on a non-solaris system. The following is printed to STDERR:

Can't exec "/sbin/zpool": No such file or directory at lib/Resmon/ExtComm.pm line 53.

and it is as if the ZpoolFree? check does not exist. No metrics are shown for the ZpoolFree? check.

Change History

06/04/10 18:29:27 changed by mark

  • status changed from new to assigned.

06/04/10 20:25:54 changed by mark

(In [414]) Greatly simplify how run_command works. (refs #14)

The original intent was to fix bad behavior when a command doesn't exist. Upon looking further into the implementation of the run_command function, I realized that using open achieved the same effect (backticks but we get a process id out of it) without a lot of code that didn't quite work when the exec failed.

06/04/10 21:32:01 changed by mark

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [416]) Store the file handle as well as the pid in children. (fixes #14)

It is used in the clean up function. Incidentally, this fixes a really annoying issue where the check timeout functionality broken. The cause of the check timeout functionality breaking was a scoping issue - the reader file handle falls out of scope when the alarm fires and dies, and it waits until the process finishes before passing control back to resmon. Storing the file handle prevents it from being cleaned up until after we kill the process in the clean_up function later on.