Pages

Nagios monitoring with NRPE allows better tracking of remote systems


Nagios is a really great host and service monitoring system with a lot of flexibility and power. It is not the easiest system to set up, but with a little patience, determination, and past tips on TechRepublic, the task is much less daunting.

One addition to Nagios that was never covered previously was using NRPE to monitor services on remote systems that are not typically exposed on the network. For instance, it is easy enough to monitor whether or not HTTP or SMTP services are available by checking on them remotely, but how do you determine whether you are running out of disk space, or if the load average has spiked? These things cannot be easily determined without having local access to the system. One way to accomplish this is with the check_by_ssh command that hasbeen looked at previously, but an even better way to do so is with the Nagios Remote Plugin Executor (NRPE) daemon.

What NRPE does is run checks on a system remote from the central Nagios server, allowing Nagios to query it as if the checks were run locally. In essence, Nagios talks to NRPE, asks it to run a specific check, waits for the response, and logs it along with everything else it watches. These are checks that could only be run locally: checking the number of users, load average, disk space usage, available memory, whether the local system can query DNS, and so on. While NRPE’s function is very similar to the check_by_ssh plugin, the overhead is much smaller, making it faster and more efficient.

To begin, you will need the NRPE daemon and the local Nagios plugins to be installed on the remote server. Using Red Hat Enterprise Linux 5, the NRPE and Nagios plugins are available via EPEL or RPMForge. Via EPEL, you would install NRPE and a few plugins using:

# yum install nrpe nagios-common nagios-plugins nagios-plugins-{disk,dns,users,load,procs}

This installs NRPE and enough Nagios plugins to at least get started. The main NRPE configuration file is /etc/nagios/nrpe.cfg, and this is where you can determine which checks NRPE will execute, and from which hosts these checks will be permitted. Also be sure that these checks run as a special user — either an ‘nrpe’ user or ‘nagios’ user. With the EPEL packages, NRPE is pre-configured to run as the user ‘nrpe’, and that user is created upon package install.

One way to lock down which hosts can access NRPE is to change the allowed_hosts option. While NRPE does do some access control, and this is a valid way of specifying allowed hosts, perhaps a better way would be to configure the firewall to only allow a specific IP address to connect to the port that NRPE is listening to.

At the end of the file are the various configured checks. These are the only checks that NRPE will perform; if the central Nagios monitor requests a check that is not listed here, it will not execute it. For instance:

command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

This will create two commands that NRPE will respond to: check_users and check_load. The name noted in square brackets is the name of the command that Nagios must call via the check_nrpe plugin.

Once NRPE is configured, you can start the NRPE service to have it begin listening to requests:

# chkconfig nrpe on; service nrpe start

Once it is started, run a test on the Nagios server, to make sure it can talk to the remote NRPE daemon:

$ /usr/lib64/nagios/plugins/check_nrpe -H 192.168.100.12 -c check_load
OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0;

This calls the check_nrpe plugin and tells it to connect to the host 192.168.100.12 and run the check_load command, which is defined in the NRPE configuration file. If the check_nrpe command returns a string like the above, NRPE is running and you can integrate check_nrpe into your existing Nagios configuration to start examining local services on the remote servers. If not, double-check that the firewall is allowing access to port 5666 (the default) on the remote system and that you have the correct plugins defined.

As an example, you might define a “check_nrpe_load” command in the Nagios server’s commands.cfg, which will be used to check the load on remote NRPE daemons:

# 'check_nrpe_load' command definition
define command {
        command_name    check_nrpe_load
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c "check_load"
}
and a corresponding service in services.cfg:
define service {
        use                             nrpe-service
        hostgroup_name                  nrpe-services
        service_description             Current Load
        check_command                   check_nrpe_load
}

And then define the hostgroup “nrpe-services” for those hosts that have NRPE installed, via hostgroups.cfg (the servers “server1″, “dns”, and “server2″ in the following example):

define hostgroup {
        hostgroup_name  nrpe-services
        alias           Nagios via NRPE
        members         server1,dns,server2
}

NRPE is a great way to get additional information that tends to be protected from outside viewing, in a way that is easy for Nagios to consume and track. It does require a few extra steps, and it would be a good idea to use iptables to restrict access to the NRPE port (5666, by default) to only authorized IPs.

With NRPE running, you can easily watch for disk usage, CPU spiking, memory issues, and other things that you would not be able to see without it. This gives you a means of tracking exactly what is going on with remote servers, both from an external view and an internal view. And because Nagios is so versatile and you can easily write your own plugins, there really is very little you can’t monitor with Nagios, even on remote servers.