Using Cfengine and Nagios passive checks to ensure software updates on CentOS/Red Hat

Prerequisites

Since cfengine and nagios are quite complex, I will not describe all the details needed to get a basic cfengine install or a basic nagios setup running. Everything below assumes you have the tools running and simply add a new functionality to it.
If you need to install the software you can find very decent RPMs on the rpmforge page. They provide a rpmforge-release package which comes with a yum repo-file and public key. Details on how to use rpmforge can be found in their faq: http://rpmforge.net/user/faq/

Setting up nagios

Enable passive service checks and external commands

In order to use passive checks with nsca you have to define the following options in your main nagios config:
accept_passive_service_checks=1
check_external_commands=1

Create a host group for your CentOS machines

In order to setup a check on all centos machines it is useful to create a host-group containing all centos-host, that way you have to specify the list of hosts only once an you can define all checks which should run on all centos machines using this hostgroup. The following nagios configuration snippet shows how to setup the group.
define hostgroup{
        hostgroup_name  CentOS
        alias           Machines running CentOS
        members         host-1, host-2
}
The names host-1, host-2 here refer to names defined in a host-definition previously, please check the nagios-documentation on how to define hosts.

Define a service check

Each service in nagios needs a server definition in the config file. Here is an example how to setup a passive check for yum.
define service{
        use                             generic-service		; template to use (this one comes with the demo-config)

        service_description             yum 			; name of the check
        check_period                    24x7			; time period in which this service is check
        max_check_attempts              1			
        normal_check_interval           1
        retry_check_interval            1
        contact_groups                  contacts		; contact group for this service
        notification_period             24x7			; time period in which notifications will be sent
        notification_options            w,u,c,r
        check_command                   check_ping		; predefined ping check, irrelevant since we disable active checks
        active_checks_enabled           0			; disable active checks
        passive_checks_enabled          1			; enable passive checks
        notifications_enabled           0			; disable notifications. IMPORTANT: if you enable notifications you will get one mail per machine once updates are available, which can get very annoying.

        
        hostgroup_name                  CentOS			; machines which will use this service
        servicegroups                   yum			; optional stick this in a service group (the service group must be defined separately)

       }

Set up nsca

Nsca is the nagios service check acceptor. It is a daemon accepting specially formed network packets send from the client tool send_nsca and puts the payload in the external command file. It is quite easy to set up and well documented. Nsca is also available from rpmforge, the package name is "nagios-nsca".

Distributing software and configs

Define rpm installation commands in cfengine

Make a new cfengine file for rpm-actions (should be imported in cfagent.conf. see cfengine reference). I call mine cf.rpms. In the control section add the following:
control:

        centos::

            DefaultPkgMgr = ( rpm )
            RPMInstallCommand = ( "/usr/bin/yum -y install %s" )

Distribute config files

Make a "master copy" of your yum repo configurations and put it in a directory on your cfengine master-host. e.g. /var/cfengine/masterfiles/yourdomain.com. In the copy section in cf.rpms add the following:
copy:
        centos::
        
            /var/cfengine/masterfiles/yourdomain.com/CentOS-Base.repo-master-copy
                dest=/etc/yum.repos.d/CentOS-Base.repo
                server=$(policyhost)
                owner=root group=root mode=644
                type=checksum
This will copy the master-copy over the local files on the cfagent-clients. You should place a warning text comment in this file that it is automatically copied from the server since all edits will be lost each time cfagent is run on the client. The "centos::" in the beginning makes this section only valid on machines for that cfengine defines class centos, this should be done automatically by cfagent on centos machines.
The same goes with the config for send_nsca built a "master config" that works put it on the cfengine server and put the following lines in the copy section of cf.rpms:
copy:

        centos::

            /var/cfengine/masterfiles/yourdomain.com/send_nsca.cfg-master-copy
                dest=/etc/nagios/send_nsca.cfg
                server=$(policyhost)
                owner=root group=root mode=644
                type=checksum
The variable $(policyhost) should be globally set in the main config (cfagent.conf) and point to your cfengine master host.

Install nsca on the clients

In the packages section of cf.rpms add
packages:

        centos::

            # need for passive checks
            nagios-nsca
                define=NSCA
                action=install
This will install the nagios-nsca package on the cfengine-client, given that it is available in any yum repository you defined in the repository config. To achieve this without generating to much outbound traffic I have my own repo on my in-house centos mirror where I put in the additional packages of rpmforge. To still get notified about rpmforge updates I have one system using the rpmforge repo directly.

Run yum update-check from inside cfengine

In the shellcommands section of cf.rpms add:
shellcommands:

        centos::

            "/usr/bin/yum check-update >/dev/null"
                useshell=true
                define=UP2DATE
                elsedefine=WARN_ABOUT_UPDATES
This will run yum check-update and depending on its exit code run the commands defined in UP2DATE or WARN_ABOUT_UPDATES which are also defined in the shellcommands section:
shellcommands:

        centos.NSCA.UP2DATE::

            "/bin/echo '$(target_host);yum;0;up2date' | /usr/sbin/send_nsca -H $(nagioshost) -c /etc/nagios/send_nsca.cfg -d ';' >/dev/null"
                useshell=true
                elsedefine=send_nsca_timeout

        centos.NSCA.WARN_ABOUT_UPDATES::

            "/bin/echo '$(target_host);yum;1;please check for updates!' | /usr/sbin/send_nsca -H $(nagioshost) -c /etc/nagios/send_nsca.cfg -d ';' >/dev/null"
                useshell=true
                elsedefine=send_nsca_timeout
The variable $(nagioshost) has to point to the right nagios server, this can be different servers for different groups. $(target_host) is defined as $(ipaddress) and has to match the IP address given in your host definition in nagios. If your host names matches exactly the names you assigned in nagios host definitions you can also set target_host to $(host). $(ipaddress) and $(host) are cfengine built-in variables. Our definition in the control section looks like this:
control:

        nagioshost = ( nagiosserver.example.com )
        target_host = ( $(ipaddress) )
 

Further actions

Automatic upgrades

Once this system is running at some time you will see warnings like the following in the nagios-gui:
host-1	WARNING	please check for updates
If you identify a package to be updated on all machines without side effects you can simply set up cfengine to do this. In the package section of cf.rpms add another entry like this
packages:

        centos::

            packagename	# name of the rpm
                cmp=eq	# compare mode version equal
                version=1.2.3 # desired version (the new version of the updated package)
                action=install
In the next run of cfagent it will try to run "yum -y install packagename" if it finds the installed version is older than the given version. CAUTION: the given definition would apply to all CentOS machines, you probably would want to limit it with additional groups (eg. mygroup.centos::)

CVS integration

You should consider putting all of your cfengine config in cvs to track changes. The cfengine servers can then checkout special versions from cvs. I will write some more about this soon.

Thanks

A big thanks to Marc Burgess for developing cfengine a great tool.
The same goes for Ethan Galstad and all contributors that made nagios what it is today.
More thanks to the CentOS team for providing the best free enterprise linux distribution.
Also I have to thank my colleague Sergey Alifanov for developing this nagios-yum-cfengine integration, based on ideas of both him and me. But he did all the implementation.

Feedback

Feedback welcome you can mail me cmr at financial dot com, or try to catch me on freednode-irc nick Meier.

Appendix A - cf.rpms

control:

    centos::

        target_host = ( $(ipaddress) )
        nagioshost = ( nagiosserver.example.com )


copy:

    centos::

        /var/cfengine/masterfiles/yourdomain.com/CentOS-Base.repo-master-copy
            dest=/etc/yum.repos.d/CentOS-Base.repo
            server=$(policyhost)
            owner=root group=root mode=644
            type=checksum

        /var/cfengine/masterfiles/yourdomain.com/send_nsca.cfg-master-copy
            dest=/etc/nagios/send_nsca.cfg
            server=$(policyhost)
            owner=root group=root mode=644
            type=checksum
            

packages:

    centos::

        # need for passive checks
        nagios-nsca
            define=NSCA
            action=install


shellcommands:

    centos::

        # Scan repository for updates
        "/usr/bin/yum check-update >/dev/null"
            useshell=true
            define=UP2DATE
            elsedefine=WARN_ABOUT_UPDATES

    centos.NSCA.UP2DATE::

        "/bin/echo '$(target_host);yum;0;up2date' | /usr/sbin/send_nsca -H $(nagioshost) -c /etc/nagios/send_nsca.cfg -d ';' >/dev/null"
            useshell=true
            elsedefine=send_nsca_timeout

    centos.NSCA.WARN_ABOUT_UPDATES::

        "/bin/echo '$(target_host);yum;1;please check for updates!' | /usr/sbin/send_nsca -H $(nagioshost) -c /etc/nagios/send_nsca.cfg -d ';' >/dev/null"
            useshell=true
            elsedefine=send_nsca_timeout


alerts:

    send_nsca_timeout::

        "send_nsca: Unable to communicate with nsca daemon on $(nagioshost) !"