Simple error reporting
In a production environment there should be no silent failures. AIX
has an excellent centralized error reporting facility whose messages
are viewed using the errpt
command. Compared to other logging
sources like syslog, the messages in errpt are of much higher quality
and low volume. They are worth reviewing!
Logs always suffer from inattention if they must be checked manually so here's a simple way to email errpt entries to yourself in real time as soon as they happen. This method has two components, forwarding root's email and then using the errdaemon ODM to make errdaemon email any new log entries when they are created.
We use root email forwarding so that the ODM doesn't have to be changed each time the email address changes. Most administrators are more comfortable editing a text file than adding and removing ODM entries so this makes maintenance easier.
In the event this begins spamming your inbox, you can remove the forward file temporarily until the issue is resolved. If you are receiving too many messages across systems during normal operation, then you'll need to upgrade to a real monitoring solution. You've outgrown the cheap solution!
Large environments may find that email delivery doesn't scale and should consider using centralized syslog and the syslog ODM entry instead. This is easier than making an errpt reporting script for the monitoring system.
To forward root's email, create a new text file /.forward
or
~root/.forward
if you changed root's home directory. Add a comma
delimited list of email addresses that root's email should be
forwarded to. Then this file must be set to read/write for root only
or it will fail.
Be aware this will also send you output from cron jobs and other system functions, but you were checking root's mail before now weren't you? You may have to go fix some cronjobs to do proper redirection to /dev/null after this.
echo "russell@mydomain.com" > /.forward chmod 400 /.forward mail -s "Testing forwarding" root < /dev/null
You'll want to send root a test message to confirm email delivery is working.
Next up we will update the errdaemon ODM database with some new entries.
errnotify: en_pid = 0 en_name = "mail_err" en_persistenceflg = 1 en_method = "/usr/bin/errpt -a -l $1 | /usr/bin/mail -s \"Errpt $1 $4 $3 $9\" root" errnotify: en_pid = 0 en_name = "syslog_err" en_persistenceflg = 1 en_method = "/usr/bin/errpt -a -l $1 | /usr/bin/tr '\n' '~' | /usr/bin/logger -p warn"
This has two new records to add to the ODM. The first record adds the code to email the current errpt entry being processed. The second entry forwards the same entry to syslog with all newlines removed. This may work well in environments which centrally monitor syslog. Depending on your needs you can choose either or both.
Copy and paste the entries into a new file
/tmp/custom_odm_additions
.
Then add them using this command.
odmadd /tmp/custom_odm_additions
Test that your new reporting method is working by using the
errlogger
command.
errlogger Testing errpt email to root
Check your email, you should receive something like this.
Date: Tue, 10 Apr 2007 02:28:13 +0200 From: root@nim To: root@nim Subject: Errpt 135 TEMP O OPMSG --------------------------------------------------------------------------- LABEL: OPMSG IDENTIFIER: AA8AB241 Date/Time: Mon Apr 9 19:28:13 CDT 2007 Sequence Number: 135 Machine Id: 000866264600 Node Id: nim Class: O Type: TEMP Resource Name: OPERATOR Description OPERATOR NOTIFICATION User Causes ERRLOGGER COMMAND Recommended Actions REVIEW DETAILED DATA Detail Data MESSAGE FROM ERRLOGGER COMMAND Testing errpt email to root
Congratulations you now have real time reporting of errpt via email.
Other useful commands for working with the ODM in this case.
# To check the ODM for error reporting records: odmget -q"en_name LIKE '*_err'" errnotify # To backup the current contents of the errnotify ODM: odmget errnotify > /tmp/errnotify.backup # To remove these customizations (mail_err & syslog_err) odmdelete -o errnotify -q"en_name LIKE '*_err'"