How to receive SNMP traps in

Here is a short tutorial on how to receive SNMP traps in Nagios, in the simplest possible way.

You can find many such tutorials on the Web, naming Nagios official site or forums [fr] or elsewhere [fr].

My tutorial is based on SNMPTT documentation. It aims at assembling only the minimum mandatory components to make things work. You can find the complete original tutorial here.

Prerequisites

I found net-snmp and net-snmp-perlmods RPMs for Fedora 5 on SourceForge.

SNMPTT installation has to be done manually, for distribution only contains executable files with no set-up script. Copy at least the following commands:

Overview

  1. Some remote host on your network (to be accurate, a piece of software on that host) sends a SNMP trap to Nagios host, where it is handled by snmptrapd service (listening on UDP port 162).
  2. Snmptrapd sends it to SNMPTT on local host, which will try to make the trap understandable. For this, it uses the sender's MIB, previously "compiled" to SNMPTT format and added to SNMPTT configuration.
  3. At last, SNMPTT sends the translated trap to Nagios on local host, using submit_check_result command to write it to Nagios external command file.

Nagios configuration defines only one service per host to receive SNMP traps, implying that if more than one trap is received from a host, only the last one will be displayed in Nagios web page. But each trap will be notified to contacts.

To acknowledge a trap in Nagios, simply force an immediate check of the trap service (it just pings the host, so that the service state returns to OK). You can also manually submit a passive check.

Configuring remote SNMP agent

Any piece of software willing to send SNMP traps to Nagios must be configured ; refer to the software specific documentation on how to do this. Most of the time, you will be asked for a destination IP address: Nagios host's, and a community: it is not used here, but you can set it to "public", for example.

Community string may be used as a password to filter traps to be accepted in snmptrapd. I didn't use it.

Configuring SNMP trap handler

On Nagios host, being root.

Configuring snmptrapd

Configuration file /etc/snmp/snmptrapd.conf must contain:

traphandle default /usr/sbin/snmptt
disableAuthorization yes
donotlogtraps  yes

Which means:

For your information, snmptrad sends trap informations to SNMPTT via command-line arguments.

Snmptrapd daemon script must be told not to translate OIDs, but leave them in digital format instead (SNMPTT will handle the translation). On Fedora, edit /etc/rc.d/init.d/snmptrapd:

OPTIONS="-On -Lsd -p /var/run/snmptrapd.pid"

Additional option is in red. Restart snmptrapd after change:

service snmptrapd restart

Note: some trap strings may be received in hexadecimal format. You can add option -Oa to snmptrapd daemon to force ASCII print.

Configuring SNMPTT

SNMPTT will be run in stand-alone mode, as opposed to daemon mode. It will be started by snmptrapd each time a trap is received. Handling a trap will take more time (slightly), but configuration will be simpler.

Configuration file /etc/snmp/snmptt.ini must contain (notice red strings):

[General]
mode = standalone
multiple_event = 1
dns_enable = 1
strip_domain = 1
strip_domain_list = <<END
your.domain
END

resolve_value_ip_addresses = 0
net_snmp_perl_enable = 1
net_snmp_perl_best_guess = 0
translate_log_trap_oid = 0
translate_value_oids = 1
translate_enterprise_oid_format = 1
translate_trap_oid_format = 1
translate_varname_oid_format = 1
translate_integers = 1
wildcard_expansion_separator = " "
allow_unsafe_regex = 0
remove_backslash_from_quotes = 0
dynamic_nodes = 0
description_mode = 0
description_clean = 1

[Logging]
stdout_enable = 0
log_enable = 1
log_file = /var/log/snmptt.log
unknown_trap_log_enable = 1
unknown_trap_log_file = /var/log/snmpttunknown.log
statistics_interval = 0
syslog_enable = 1
syslog_facility = local0
syslog_level_debug = <<END
END
syslog_level_info = <<END
END
syslog_level_notice = <<END
END
syslog_level_warning = <<END
END
syslog_level_err = <<END
END
syslog_level_crit = <<END
END
syslog_level_alert = <<END
END
syslog_level = info
syslog_system_enable = 1
syslog_system_facility = local0
syslog_system_level = warning

[Exec]
exec_enable = 1
pre_exec_enable = 0
unknown_trap_exec =

[Debugging]
DEBUGGING = 0
DEBUGGING_FILE =
DEBUGGING_FILE_HANDLER =

[TrapFiles]
snmptt_conf_files = <<END
/etc/snmp/snmptt.conf
END

In red:

"Compiling" MIBs

You must gather all MIBs for monitored software, so you can feed SNMPTT with them. Compiling consists in extracting each OID of type "trap" and its associated comments, and generate a configuration file in SNMPTT format from these informations.

Run the following command on each of your MIB files:

snmpttconvertmib --in=<MIB file> --out=/etc/snmp/snmptt.conf.<equipment> \
--exec='/usr/local/nagios/libexec/eventhandlers/submit_check_result $r TRAP 1'

The resulting SNMPTT configuration file will contain blocks (one per selected OID), looking like this:

EVENT someEvent .1.3.6.1.4.1.6876.0.1 "Status Events" Normal
FORMAT Some full-text description $*
EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $R TRAP 1 "$*"
SDESC
Some full-text description
possibly on many lines
EDESC

FORMAT directive changes the value of $* before it is used by EXEC directive. Initially, it concatenates the first line of comments (directive SDESC) with $*, $* being one string containing all variables values sent by the trap (you can also access individual variable via $1 $2... See SNMPTT documentation for more details).

As only the first line of the comments is used, the resulting string sent to Nagios may be meaningless. You will then have to achieve the FORMAT directive by hand.

Also note that by default all traps are sent as WARNING check results to Nagios (the 1 in submit_check_result $R TRAP 1 "$*"). You may want to change it to 0 (OK) or 2 (CRITICAL), depending on trap's description.

When done, add to SNMPTT configuration file /etc/snmp/snmptt.ini the path to compiled configuration files:

[...]
[TrapFiles]
snmptt_conf_files = <<END
/etc/snmp/snmptt.conf.<equipment1>
/etc/snmp/snmptt.conf.<equipment2>
END

Logs rotation

Snmptrapd was told not to generate its own log, but SNMPTT will create two logs: snmptt.log for handled traps, and snmpttunknown.log for traps whose OID isn't listed in SNMPTT configuration files. You can limit log growth by apendding the following to /etc/logrotate.conf:

[...]
# system-specific logs may be also be configured here.

/var/log/snmp/snmptt.log /var/log/snmp/snmpttunknown.log {
   missingok
}

General log rotation will apply, for Fedora it's a weekly basis, retaining 4 logs.

DNS server configuration

Traps are originating from an IP address, that SNMPTT needs to translate into a host name. It does so by sending a reverse DNS query (search of PTR DNS record) to the DNS server defined in OS configuration. This name resolution is mandatory for SNMPTT proper operation.

In my case, DNS server is Active Directory integrated. If PTR record is automatically created by Windows boxes joining AD, Linux (or other OSes) boxes don't self-register. I have to create their DNS record manually, and check "create associate PTR record" option. If PTR record isn't created for some reason, I create it manually (note: host FQDN must end with ".", e.g. "computer.your.domain.").

You can check that reverse DNS lookup works on Nagios host for a given IP address using the command: dig -x <IP address>

Warning! When SNMPTT resolves via DNS the sender's host name, the result comes in lowercase. If hostname in Nagios is uppercase, you have to change it accordingly in SNMPTT. Use REGEX directive in file snmptt.conf.<equipment>. For example :

EVENT someEvent .1.3.6.1.4.1.6876.0.1 "Status Events" Normal
FORMAT $*
EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $R TRAP 1 "$*"
REGEX (myhostname)(MYHOSTNAME)
SDESC
[...]

This implies adding one REGEX per host that may send the SNMP trap, which can be tedious. Use of "unsafe regex" may help. See SNMPTT documentation on REGEX.

Nagios configuration

You will use passive checks to receive SNMP traps (sounds quite logical...) but they also will be volatiles. If ever two traps are received from the same host, the second one coming in before the first one was reset to OK, we want to be notified twice, although there is no state change. That's why we use a volatile service.

You might define (for example) a service template for SNMP traps, inheriting from a generic service template:

define service{
   name                         generic-service
   register                     0
   check_period                 24x7
   max_check_attempts           3
   normal_check_interval        15
   retry_check_interval         5
   active_checks_enabled        1
   passive_checks_enabled       0
   parallelize_check            1
   obsess_over_service          0
   check_freshness              0
   event_handler_enabled        0
   flap_detection_enabled       0
   process_perf_data            1
   retain_status_information    1
   retain_nonstatus_information 1
   notification_interval        60
   notification_period          24x7
   notification_options         w,u,c,r
   notifications_enabled        1
}

define service{
   name                    snmptrap-service
   use                     generic-service
   register                0
   service_description     TRAP
   is_volatile             1
   check_command           check-host-alive
   max_check_attempts      1
   normal_check_interval   1
   retry_check_interval    1
   passive_checks_enabled  1
   check_period            none
   notification_interval   31536000
   contact_groups          somegroup
}

Use the template for each SNMP-monitored host :

define service{
   host_name               somehost
   use                     snmptrap-service
   contact_groups          somegroup
}

Notice in red check-host-alive command (a dumb ping) that resets service state to OK when you force an active check. Also notice that notification interval is strangely long - one year - to avoid receiving regularly notifications for the same trap (as long as the service hasn't been reset to OK), which would let you think that a new trap, similar to the previous, has been received. Don't forget to restart Nagios once a year to reset that counter.

Restart Nagios, the job is done. Check that it works by generating a trap on a monitored host. Service "TRAP" associated with that host should change to WARNING state, check output being the string sent by SNMPTT.