Simple error reporting

In a production environment there should be no silent failures. AIX has an excellent centralized error reporting facility whose messages are viewed using the errpt command. Compared to other logging sources like syslog, the messages in errpt are of much higher quality and low volume. They are worth reviewing!

Logs always suffer from inattention if they must be checked manually so here's a simple way to email errpt entries to yourself in real time as soon as they happen. This method has two components, forwarding root's email and then using the errdaemon ODM to make errdaemon email any new log entries when they are created.

We use root email forwarding so that the ODM doesn't have to be changed each time the email address changes. Most administrators are more comfortable editing a text file than adding and removing ODM entries so this makes maintenance easier.

In the event this begins spamming your inbox, you can remove the forward file temporarily until the issue is resolved. If you are receiving too many messages across systems during normal operation, then you'll need to upgrade to a real monitoring solution. You've outgrown the cheap solution!

Large environments may find that email delivery doesn't scale and should consider using centralized syslog and the syslog ODM entry instead. This is easier than making an errpt reporting script for the monitoring system.

To forward root's email, create a new text file /.forward or ~root/.forward if you changed root's home directory. Add a comma delimited list of email addresses that root's email should be forwarded to. Then this file must be set to read/write for root only or it will fail.

Be aware this will also send you output from cron jobs and other system functions, but you were checking root's mail before now weren't you? You may have to go fix some cronjobs to do proper redirection to /dev/null after this.

echo "russell@mydomain.com" > /.forward
chmod 400 /.forward
mail -s "Testing forwarding" root < /dev/null

You'll want to send root a test message to confirm email delivery is working.

Next up we will update the errdaemon ODM database with some new entries.

errnotify:
        en_pid = 0
        en_name = "mail_err"
        en_persistenceflg = 1
        en_method = "/usr/bin/errpt -a -l $1 | /usr/bin/mail -s \"Errpt $1 $4 $3 $9\" root"

errnotify:
        en_pid = 0
        en_name = "syslog_err"
        en_persistenceflg = 1
        en_method = "/usr/bin/errpt -a -l $1 | /usr/bin/tr '\n' '~' | /usr/bin/logger -p warn"

This has two new records to add to the ODM. The first record adds the code to email the current errpt entry being processed. The second entry forwards the same entry to syslog with all newlines removed. This may work well in environments which centrally monitor syslog. Depending on your needs you can choose either or both.

Copy and paste the entries into a new file /tmp/custom_odm_additions.

Then add them using this command.

odmadd /tmp/custom_odm_additions

Test that your new reporting method is working by using the errlogger command.

errlogger Testing errpt email to root

Check your email, you should receive something like this.

Date: Tue, 10 Apr 2007 02:28:13 +0200
From: root@nim
To: root@nim
Subject: Errpt 135 TEMP O OPMSG

---------------------------------------------------------------------------
LABEL:          OPMSG
IDENTIFIER:     AA8AB241

Date/Time:       Mon Apr  9 19:28:13 CDT 2007
Sequence Number: 135
Machine Id:      000866264600
Node Id:         nim
Class:           O
Type:            TEMP
Resource Name:   OPERATOR

Description
OPERATOR NOTIFICATION

User Causes
ERRLOGGER COMMAND

        Recommended Actions
        REVIEW DETAILED DATA

Detail Data
MESSAGE FROM ERRLOGGER COMMAND
Testing errpt email to root

Congratulations you now have real time reporting of errpt via email.

Other useful commands for working with the ODM in this case.

# To check the ODM for error reporting records:

odmget -q"en_name LIKE '*_err'" errnotify

# To backup the current contents of the errnotify ODM:

odmget errnotify > /tmp/errnotify.backup

# To remove these customizations (mail_err & syslog_err)

odmdelete -o errnotify -q"en_name LIKE '*_err'"