Simple error reporting

In a production environment there should be no silent failures. AIX has an excellent centralized error reporting facility whose messages are viewed using the errpt command. Compared to other logging sources like syslog, the messages in errpt are of much higher quality and low volume. They are worth reviewing!

Logs always suffer from inattention if they must be checked manually so here's a simple way to email errpt entries to yourself in real time as soon as they happen. This method has two components, forwarding root's email and then using the errdaemon ODM to make errdaemon email any new log entries when they are created.

We use root email forwarding so that the ODM doesn't have to be changed each time the email address changes. Most administrators are more comfortable editing a text file than adding and removing ODM entries so this makes maintenance easier.

In the event this begins spamming your inbox, you can remove the forward file temporarily until the issue is resolved. If you are receiving too many messages across systems during normal operation, then you'll need to upgrade to a real monitoring solution. You've outgrown the cheap solution!

Large environments may find that email delivery doesn't scale and should consider using centralized syslog and the syslog ODM entry instead. This is easier than making an errpt reporting script for the monitoring system.

To forward root's email, create a new text file /.forward or ~root/.forward if you changed root's home directory. Add a comma delimited list of email addresses that root's email should be forwarded to. Then this file must be set to read/write for root only or it will fail.

Be aware this will also send you output from cron jobs and other system functions, but you were checking root's mail before now weren't you? You may have to go fix some cronjobs to do proper redirection to /dev/null after this.

echo "russell@mydomain.com" > /.forward
chmod 400 /.forward
mail -s "Testing forwarding" root < /dev/null

You'll want to send root a test message to confirm email delivery is working.

Next up we will update the errdaemon ODM database with some new entries.

errnotify:
        en_pid = 0
        en_name = "mail_err"
        en_persistenceflg = 1
        en_method = "/usr/bin/errpt -a -l $1 | /usr/bin/mail -s \"Errpt $1 $4 $3 $9\" root"

errnotify:
        en_pid = 0
        en_name = "syslog_err"
        en_persistenceflg = 1
        en_method = "/usr/bin/errpt -a -l $1 | /usr/bin/tr '\n' '~' | /usr/bin/logger -p warn"

This has two new records to add to the ODM. The first record adds the code to email the current errpt entry being processed. The second entry forwards the same entry to syslog with all newlines removed. This may work well in environments which centrally monitor syslog. Depending on your needs you can choose either or both.

Copy and paste the entries into a new file /tmp/custom_odm_additions.

Then add them using this command.

odmadd /tmp/custom_odm_additions

Test that your new reporting method is working by using the errlogger command.

errlogger Testing errpt email to root

Check your email, you should receive something like this.

Date: Tue, 10 Apr 2007 02:28:13 +0200
From: root@nim
To: root@nim
Subject: Errpt 135 TEMP O OPMSG

---------------------------------------------------------------------------
LABEL:          OPMSG
IDENTIFIER:     AA8AB241

Date/Time:       Mon Apr  9 19:28:13 CDT 2007
Sequence Number: 135
Machine Id:      000866264600
Node Id:         nim
Class:           O
Type:            TEMP
Resource Name:   OPERATOR

Description
OPERATOR NOTIFICATION

User Causes
ERRLOGGER COMMAND

        Recommended Actions
        REVIEW DETAILED DATA

Detail Data
MESSAGE FROM ERRLOGGER COMMAND
Testing errpt email to root

Congratulations you now have real time reporting of errpt via email.

Other useful commands for working with the ODM in this case.

# To check the ODM for error reporting records:

odmget -q"en_name LIKE '*_err'" errnotify

# To backup the current contents of the errnotify ODM:

odmget errnotify > /tmp/errnotify.backup

# To remove these customizations (mail_err & syslog_err)

odmdelete -o errnotify -q"en_name LIKE '*_err'"

Working with Snap files

I frequently work with customer systems where I need a systems inventory. This could be for troubleshooting or just to save the final state of a system for later reference.

I have worked with many consultants who have an inventory script they give customers but I have found that I prefer to use the tools native to the platform when they are available. On AIX I use IBM's native snap command. If you've ever been on the phone with IBM support before, you know they barely wait to ask your name before they ask for you to upload a snap.

IBM has created an excellent tool for troubleshooting AIX in the snap utility which is distributed as part of the OS. In my experience it captures about 90% of what I need to know about a system, including:

  • Installed software
  • Devices and attributes
  • LVM details and disk layout
  • Network statistics and configuration data

Rather than ask a customer to run commands for me and capture the output, or ask them to run a script from an untrusted source as root on their production server, I always ask for a snap.

Read more…

Welcome to the new ASC site!

I'm proud to announce that Adams Systems Consultancy is now established in The Netherlands and open for business!

This new company allows me to provide services to customers throughout Europe from a central location.

With this site I'm testing a new blog format to allow me to post technical articles and tips. Stay tuned for posts as I migrate content from my notes into useful articles!