Introduction

Reconnoiter is a monitoring and trend analysis system designed to cope with large architectures (thousands of machines and hundreds of thousands of metrics).

Heavy focus is placed on decoupling the various components of the system to allow for disjoint evolution of each component as issues arise or new requirements are identified. Resource monitoring, metric aggregation, metric analysis and visualization are all cleanly separated.

The monitor, noitd, is written in C and designed to support highly concurrent and rapid checks with an expected capability of monitoring 100,000 services per minute (6 million checks per hour.) While it is hard to make writing checks "easy" in this high-performance environment, efforts have been made to ensure that custom check scripting does not require the expertise of writing highly-concurrent, event-driven C code. Instead, glue is provided via scripting languages such as Lua that attempt to handle aspects of this high-concurrency environment transparently. As with any high-performance system, you can easily introduce non-performant code and jeopardize performance system-wide.

The aggregator, stratcond, is also written in C and responsible for the simple task of securely gathering data from all of the distributed noitd instances and transforming them into the data storage facility (currently PostgreSQL).

The data storage facility (PostgreSQL) holds all information about individual checks, their statuses and the individual metrics associated with them. Automatic processes are in place that summarize the numeric metrics into windowed averages for expedient graphing at a variety of time window resolutions (hour, day, month, year, etc.).

The visualization interface (reconnoiter) draws from the data store to visualize collected metrics and assist with monitoring, trending and other visual analyses. The visualization system is written in PHP.