VIOS v3.1 upgrades failing from 2.2.6.51

Recently one of my customers had difficulty upgrading to PowerVM v3.1 using the alt_disk method. IBM's instructions are to upgrade your v2 VIO to the latest to ensure a smooth transition to v3, and then the alt_disk upgrade method was added in late 2.2.6.30.

Unfortunately in this case, there's a poorly documented bug in the installer. I decided to document it here to help others who may encounter it in the future.

Source versions:
  • PowerVM 2.2.5.20
Target versions:
  • Update to PowerVM 2.2.6.51
  • Upgrade to PowerVM 3.1.0.21

The root cause of the problem is that the bos.alt_disk_install.rte and bos.alt_disk_install.boot_images filesets in 2.2.6.1 are newer than the versions on the installation media for 3.1.0.0 or 3.1.0.10.

Attempting to do a viosupgrade results in a Perl exception instead of an appropriate error message:

VIO2# viosupgrade -l -i /mnt/PowerVM31010.mksysb -a hdisk1
Welcome to viosupgrade tool.
Operation triggered for given node(s).
Broadcast message from root@VIO2 (vty0) at 14:34:26 ...
WARNING!!! VIOS Upgrade operation is in progress. Kindly Refrain from making any configuration changes...
Please wait for completion..
Upgrading from ioslevel '2.2.6.51' to '3.1.0.10'.
Verifying whether the MPIO software(s) is installed on the VIOS.
Following list of fileset(s) required for the VIOS meta data restore
seems to be not present in the provided installation image.
Continuing with the upgrade process may result in restore failure post installation:
1: devices.sddpcm.61.rte => IBM SDD PCM for AIX V61
2: devices.fcp.disk.ibm.mpio.rte => IBM MPIO FCP Disk Device
Choice[Y/N]:
y
Verification of the MPIO software(s) is successful.
Initiating VIOS configuration backup..
VIOS configuration backup successful.
Fileset installation failed: 'bos.alt_disk_install'.
Use of uninitialized value in numeric eq (==) at /usr/ios/sbin/viosupg.pl line 2566.
Use of uninitialized value in numeric eq (==) at /usr/ios/sbin/viosupg.pl line 2566.
Use of uninitialized value in numeric eq (==) at /usr/ios/sbin/viosupg.pl line 2572.
Use of uninitialized value in numeric eq (==) at /usr/ios/sbin/viosupg.pl line 2572.

IBM support said to either revert to backup and upgrade to 2.2.6.31, or upgrade to 3.1.1.x. Repeating the upgrade wasn't appealing, and 3.1.1.x is too new and has some HIPER APARs regarding HBA issues.

Once IBM confirmed the issue was that bos.alt_disk_install was too new, I suggested we just reject that new version. I confirmed we had the old version still installed. We had not performed a commit operation after the update to 2.2.6.31.

# lslpp -hac | grep alt_disk_install
/usr/lib/objrepos:bos.alt_disk_install.boot_images:6.1.9.200::COMMIT:COMPLETE:10/11/16:20;51;35
/usr/lib/objrepos:bos.alt_disk_install.boot_images:6.1.9.200::APPLY:COMPLETE:10/11/16:20;51;35
/usr/lib/objrepos:bos.alt_disk_install.boot_images:6.1.9.202::COMMIT:COMPLETE:02/11/20:11;58;24
/usr/lib/objrepos:bos.alt_disk_install.boot_images:6.1.9.202::APPLY:COMPLETE:04/17/17:11;34;13 <<< last committed
/usr/lib/objrepos:bos.alt_disk_install.boot_images:6.1.9.406::APPLY:COMPLETE:02/11/20:12;11;44 <<< applied level
/usr/lib/objrepos:bos.alt_disk_install.rte:6.1.9.200::COMMIT:COMPLETE:10/11/16:20;51;28
/usr/lib/objrepos:bos.alt_disk_install.rte:6.1.9.200::APPLY:COMPLETE:10/11/16:20;51;28
/usr/lib/objrepos:bos.alt_disk_install.rte:6.1.9.201::COMMIT:COMPLETE:02/11/20:11;58;24
/usr/lib/objrepos:bos.alt_disk_install.rte:6.1.9.201::APPLY:COMPLETE:04/17/17:11;41;20 <<< last committed
/usr/lib/objrepos:bos.alt_disk_install.rte:6.1.9.400::APPLY:COMPLETE:02/11/20:12;16;27 <<< applied level

IBM agreed that we could reject the new version, and proceed.

To fix the issue with the viosupgrade script and too new of bos.alt_disk_install filesets, we simply rejected those filesets using the installp command in oem_setup_env:

  • installp -r bos.alt_disk_install.boot_images

  • installp -r bos.alt_disk_install.rte

    # installp -r bos.alt_disk_install.boot_images
    +-----------------------------------------------------------------------------+
                            Pre-reject Verification...
    +-----------------------------------------------------------------------------+
    Verifying selections...done
    Verifying requisites...done
    Results...
    SUCCESSES
    ---------
      Filesets listed in this section passed pre-reject verification
      and will be rejected.
      Selected Filesets
      -----------------
      bos.alt_disk_install.boot_images 6.1.9.406  # Alternate Disk Installation ...
      << End of Success Section >>
    FILESET STATISTICS
    ------------------
        1  Selected to be rejected, of which:
            1  Passed pre-reject verification
      ----
        1  Total to be rejected
    +-----------------------------------------------------------------------------+
                              Rejecting Software...
    +-----------------------------------------------------------------------------+
    installp:  REJECTING software for:
            bos.alt_disk_install.boot_images 6.1.9.406
    Finished processing all filesets.  (Total time:  3 secs).
    +-----------------------------------------------------------------------------+
                                    Summaries:
    +-----------------------------------------------------------------------------+
    Installation Summary
    --------------------
    Name                        Level           Part        Event       Result
    -------------------------------------------------------------------------------
    bos.alt_disk_install.boot_i 6.1.9.406       USR         REJECT      SUCCESS
    
    # installp -r bos.alt_disk_install.rte
    +-----------------------------------------------------------------------------+
                            Pre-reject Verification...
    +-----------------------------------------------------------------------------+
    Verifying selections...done
    Verifying requisites...done
    Results...
    SUCCESSES
    ---------
      Filesets listed in this section passed pre-reject verification
      and will be rejected.
      Selected Filesets
      -----------------
      bos.alt_disk_install.rte 6.1.9.400          # Alternate Disk Installation ...
      << End of Success Section >>
    FILESET STATISTICS
    ------------------
        1  Selected to be rejected, of which:
            1  Passed pre-reject verification
      ----
        1  Total to be rejected
    +-----------------------------------------------------------------------------+
                              Rejecting Software...
    +-----------------------------------------------------------------------------+
    installp:  REJECTING software for:
            bos.alt_disk_install.rte 6.1.9.400
    Successfully updated the Kernel Authorization Table.
    Successfully updated the Kernel Role Table.
    Successfully updated the Kernel Command Table.
    Successfully updated the Kernel Device Table.
    Successfully updated the Kernel Object Domain Table.
    Successfully updated the Kernel Domains Table.
    Successfully updated the Kernel Authorization Table.
    Successfully updated the Kernel Role Table.
    Successfully updated the Kernel Command Table.
    Successfully updated the Kernel Device Table.
    Successfully updated the Kernel Object Domain Table.
    Successfully updated the Kernel Domains Table.
    Successfully updated the Kernel Authorization Table.
    Successfully updated the Kernel Role Table.
    Successfully updated the Kernel Command Table.
    Successfully updated the Kernel Device Table.
    Successfully updated the Kernel Object Domain Table.
    Successfully updated the Kernel Domains Table.
    Successfully updated the Kernel Authorization Table.
    Successfully updated the Kernel Role Table.
    Successfully updated the Kernel Command Table.
    Successfully updated the Kernel Device Table.
    Successfully updated the Kernel Object Domain Table.
    Successfully updated the Kernel Domains Table.
    Finished processing all filesets.  (Total time:  6 secs).
    +-----------------------------------------------------------------------------+
                                    Summaries:
    +-----------------------------------------------------------------------------+
    Installation Summary
    --------------------
    Name                        Level           Part        Event       Result
    -------------------------------------------------------------------------------
    bos.alt_disk_install.rte    6.1.9.400       ROOT        REJECT      SUCCESS
    bos.alt_disk_install.rte    6.1.9.400       USR         REJECT      SUCCESS
    

Once having reverted to the older version, viosupgrade ran normally.

$ viosupgrade -l -i /mnt/PowerVM31010.mksysb -a hdisk1
Welcome to viosupgrade tool.
Operation triggered for given node(s).
Broadcast message from root@VIO2 (vty0) at 15:13:07 ...
WARNING!!! VIOS Upgrade operation is in progress. Kindly Refrain from making any configuration changes...
Please wait for completion..
Upgrading from ioslevel '2.2.6.51' to '3.1.0.10'.
Verifying whether the MPIO software(s) is installed on the VIOS.
Following list of fileset(s) required for the VIOS meta data restore
seems to be not present in the provided installation image.
Continuing with the upgrade process may result in restore failure post installation:
1: devices.fcp.disk.ibm.mpio.rte => IBM MPIO FCP Disk Device
2: devices.sddpcm.61.rte => IBM SDD PCM for AIX V61
Choice[Y/N]:
y
Verification of the MPIO software(s) is successful.
Initiating VIOS configuration backup..
VIOS configuration backup successful.
Initiating installation on alternate disk(s)..
Installation on alternate disk(s) successful.
Copying files to altinst_rootvg.
Waking up altinst_rootvg successful.
VIOS will be rebooted after '60' seconds to boot from the newly installed disk.
Press contrl+c to terminate.
VIOS metadata restore (viosbr -restore) will be automatically resumed
after the reboot.
VIOS may be rebooted once during this restore process. Refrain from making
any changes to the VIOS virtual configurations during the restore process.
You can verify the restore status using 'viosupgrade -l -q' command and
resume your operation after the completion of the restore process.
portmir: Cannot unload mirror module
portmir: Mirroring is stopped.
Rebooting . . .

Merging NMON files and the new nmonchart

I frequently have to review performance data for customers while doing performance troubleshooting and capacity planning. Kudos to Nigel Griffiths for his excellent NMON tool and associated programs. NMON makes collecting performance data on AIX and Linux a breeze.

Analyzing that data is fairly simple. There are a variety of tools, and I have typically used the Excel based NMON Analyzer [1]. I feel that the graphs are quite good and because it is in Excel it's easy for me to annotate and share with customers. It has the option to merge data files, unfortunately due to Excel's constraints it is limited to about five days of data.

Recently I was learning about Nigel's new efforts with JSON and web based graphing, and came across his nmonchart [2] tool. This new tool has dynamically zooming graphs via javascript directly in your browser from a single file! I had to try it, and I'm very impressed.

Running it against a single file was trivial and the resulting HTML loaded into a browser without issues to view the data. However when I wanted to view several days of data in separate files there wasn't an option.

A few minutes later and some awk magic result an awk script to combine data files for reading into nmonchart.

nmonmerge.sh (Source)

#!/bin/sh

################################################################################
# Copyright 2019, Russell Adams <Russell.Adams@AdamsSystems.nl>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

######################################################################################
# Merges multiple NMON files into a single NMON file for use with tools like nmonchart
# Prints the new file to STDOUT, be sure to redirect the output
# Uses the headers from the first file ONLY, disk names and adapters might not match


# Print first file headers, stop at the first timestamp
awk 'BEGIN {p=1} ; /^ZZZZ/ {p=0} ; p {print $0}' "$1"

# Process all files, dynamically replacing all the timestamps into a new series
awk 'BEGIN {FS=OFS="," ; curr=0} ;
     $1 == "ZZZZ" {curr+=1} ;
     $2 ~ "^T[0-9]{4}" {print $1, sprintf("T%04d",curr), substr($0, index($0,$3))}' \
    "$@"

Executing this script against a series of nmon files prints a combined data stream, so make sure to redirect it to a new file. Then I can run nmonchart on it!

 % ls -l tsm_*.nmon
-rw-r--r-- 1 adamsrl adamsrl 1016001 Nov 28 13:55 tsm_130901_0000.nmon
-rw-r--r-- 1 adamsrl adamsrl 1015298 Nov 28 13:47 tsm_130902_0000.nmon
-rw-r--r-- 1 adamsrl adamsrl 1023251 Nov 28 13:47 tsm_130903_0000.nmon
-rw-r--r-- 1 adamsrl adamsrl 1022257 Nov 28 13:47 tsm_130904_0000.nmon
-rw-r--r-- 1 adamsrl adamsrl 1023189 Nov 28 13:47 tsm_130905_0000.nmon
-rw-r--r-- 1 adamsrl adamsrl 1018528 Nov 28 13:47 tsm_130906_0000.nmon
-rw-r--r-- 1 adamsrl adamsrl 1016016 Nov 28 13:47 tsm_130907_0000.nmon
-rw-r--r-- 1 adamsrl adamsrl 1014277 Nov 28 13:47 tsm_130908_0000.nmon
-rw-r--r-- 1 adamsrl adamsrl 1011618 Nov 28 13:47 tsm_130909_0000.nmon
-rw-r--r-- 1 adamsrl adamsrl  540731 Nov 28 13:47 tsm_130910_0000.nmon

% ~/scripts/nmonmerge.sh tsm_1309*.nmon > tsm_all.nmon

% nmonchart tsm_all.nmon

% ls -l tsm_all*
-rw-r--r-- 1 adamsrl adamsrl 4370035 Nov 28 14:09 tsm_all.html
-rw-r--r-- 1 adamsrl adamsrl 8073319 Nov 28 14:08 tsm_all.nmon

The resulting file loaded fine into Firefox on Linux and after allowing Google in ublock Origin for the file, the charts are lovely.

I've uploaded the combined file as an example: tsm_all.html

[1] https://developer.ibm.com/articles/au-nmon_analyser/
[2] http://nmon.sourceforge.net/pmwiki.php?n=Site.Nmonchart

Verifying IBM downloads using XSLTPROC

I often download updates from IBM FixCentral using FTPS/SFTP instead of IBM's Download Director. It's just easier to do on a NIM server rather than my laptop or customer desktop. Unfortunately it makes it difficult to validate the checksums of the downloads.

IBM doesn't typically publish a simple text file of checksums with any of their POWER or AIX downloads. They do include an XML file for Download Director.

They do make an attempt to allow customers to validate the download using that XML file in VIO downloads by providing a file called ck_sum.bff. The customer is instructed to execute ck_sum.bff against the directory of downloads to confirm the downloads.

This raises many red flags for me for violating security best practices. I should never execute untrusted code from any source on my systems! The typical place this would run is on a NIM system or AIX box as root! I strongly advise against using this method.

Given IBM does have the checksums in an XML file, we can extract them for validation without using untrusted code. I accomplished this on a Linux box with XSLTPROC, but I believe this tool may be available for AIX as well.

We need to use the following XSLT file to convert IBM's XML to a table we can use.

<!-- Credit to Liam Quin of DelightfulComputing.com for the help on Freenode's #xml -->

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0"
  xmlns:ds="http://www.w3.org/2000/09/xmldsig#"
  xmlns:sdd-common="http://docs.oasis-open.org/sdd/ns/common"
  xmlns:sdd-pd="http://docs.oasis-open.org/sdd/ns/packageDescriptor"
  xmlns:sdd-um="http://w3.ibm.com/xmlns/b2b/b2b/sdd/um/nsr_w3_b2b_b2b_sdd_um_0100.xml"
  >

  <xsl:output method="text" /> <!--* don't want XML declaration *-->

  <xsl:template match="*"><xsl:apply-templates select="*"/></xsl:template>

  <xsl:template match="*[local-name(.)='Content'][ds:DigestValue]">
      <xsl:value-of select="ds:DigestValue"/>
      <xsl:text>  </xsl:text>
      <xsl:value-of select="@pathname"/>
      <xsl:text>&#xa;</xsl:text>
   </xsl:template>

</xsl:stylesheet>

Save this away, I put it in ~scripts/SDD.xslt.

We can then execute this XML transformation document against IBM's .pd.sdd files.

% xsltproc ~/scripts/SDD.xslt *.pd.sdd
c96248c5131787e08451ce92f97ea6d4b650402dfd4bdf24ee4c87b6a333b92d  ck_sum.bff
f95ec2d4024053db1d83ba98fca73847c68681dfd808b8bd0ccc712262ee604b  VIOS_SP_3.1.0.21.bff
ac9c0d5b7a88d9d91bd5d4074ee2b861686c00bef0c35ceceacda30c9f08a250  Java8_64.jre__1_8.0.0.526.bff
79470e6787602b1e810703ba028d514f38ef6d99e1a4cf0fe010ccbc389a3700  Java8_64.sdk__1_8.0.0.526.bff
87d1175a051ba050f6466d1fa82469c93753331fef2ae474dc09cda22b9595f0  U877265.bff
5f5cf575cc82212208fd1f068818a52251d55f91f5710fd58f75d79a58a85aaa  U877266.bff
5fff5bdaaa158a20f46a400e633a383d290e1d43cd6dfe0072ca352468795ba6  U877269.bff
a4c092017a660b137fd7ebb6ddc9ae3de76b8297a5ea48a4016651a519573e24  U880057.bff
54d5ef553e49eaf1a80585bdc12db7b5ad29ac7434759abe7830ad63c9dcd2c4  U882614.bff
d4a77a25a7d85fb860a4601b03ba8df3492550398378d8714c45ee612945cca6  U882619.bff
...

That matches the output from sha256sum! In fact, we can feed that directly to sha256sum:

% xsltproc ~/scripts/SDD.xslt *.pd.sdd | sha256sum -c -
ck_sum.bff: OK
VIOS_SP_3.1.0.21.bff: OK
Java8_64.jre__1_8.0.0.526.bff: OK
Java8_64.sdk__1_8.0.0.526.bff: OK
U877265.bff: OK
U877266.bff: OK
U877269.bff: OK
U880057.bff: OK
U882614.bff: OK
U882619.bff: OK
...

So now we can validate our downloads from IBM without the GUI Download Director or running untrusted code on our system.

Thanks again to Liam Quin of http://DelightfulComputing.com/ for his assistance on Freenode's #xml channel! This will be a real timesaver!

Simple error reporting

In a production environment there should be no silent failures. AIX has an excellent centralized error reporting facility whose messages are viewed using the errpt command. Compared to other logging sources like syslog, the messages in errpt are of much higher quality and low volume. They are worth reviewing!

Logs always suffer from inattention if they must be checked manually so here's a simple way to email errpt entries to yourself in real time as soon as they happen. This method has two components, forwarding root's email and then using the errdaemon ODM to make errdaemon email any new log entries when they are created.

We use root email forwarding so that the ODM doesn't have to be changed each time the email address changes. Most administrators are more comfortable editing a text file than adding and removing ODM entries so this makes maintenance easier.

In the event this begins spamming your inbox, you can remove the forward file temporarily until the issue is resolved. If you are receiving too many messages across systems during normal operation, then you'll need to upgrade to a real monitoring solution. You've outgrown the cheap solution!

Large environments may find that email delivery doesn't scale and should consider using centralized syslog and the syslog ODM entry instead. This is easier than making an errpt reporting script for the monitoring system.

To forward root's email, create a new text file /.forward or ~root/.forward if you changed root's home directory. Add a comma delimited list of email addresses that root's email should be forwarded to. Then this file must be set to read/write for root only or it will fail.

Be aware this will also send you output from cron jobs and other system functions, but you were checking root's mail before now weren't you? You may have to go fix some cronjobs to do proper redirection to /dev/null after this.

echo "russell@mydomain.com" > /.forward
chmod 400 /.forward
mail -s "Testing forwarding" root < /dev/null

You'll want to send root a test message to confirm email delivery is working.

Next up we will update the errdaemon ODM database with some new entries.

errnotify:
        en_pid = 0
        en_name = "mail_err"
        en_persistenceflg = 1
        en_method = "/usr/bin/errpt -a -l $1 | /usr/bin/mail -s \"Errpt $1 $4 $3 $9\" root"

errnotify:
        en_pid = 0
        en_name = "syslog_err"
        en_persistenceflg = 1
        en_method = "/usr/bin/errpt -a -l $1 | /usr/bin/tr '\n' '~' | /usr/bin/logger -p warn"

This has two new records to add to the ODM. The first record adds the code to email the current errpt entry being processed. The second entry forwards the same entry to syslog with all newlines removed. This may work well in environments which centrally monitor syslog. Depending on your needs you can choose either or both.

Copy and paste the entries into a new file /tmp/custom_odm_additions.

Then add them using this command.

odmadd /tmp/custom_odm_additions

Test that your new reporting method is working by using the errlogger command.

errlogger Testing errpt email to root

Check your email, you should receive something like this.

Date: Tue, 10 Apr 2007 02:28:13 +0200
From: root@nim
To: root@nim
Subject: Errpt 135 TEMP O OPMSG

---------------------------------------------------------------------------
LABEL:          OPMSG
IDENTIFIER:     AA8AB241

Date/Time:       Mon Apr  9 19:28:13 CDT 2007
Sequence Number: 135
Machine Id:      000866264600
Node Id:         nim
Class:           O
Type:            TEMP
Resource Name:   OPERATOR

Description
OPERATOR NOTIFICATION

User Causes
ERRLOGGER COMMAND

        Recommended Actions
        REVIEW DETAILED DATA

Detail Data
MESSAGE FROM ERRLOGGER COMMAND
Testing errpt email to root

Congratulations you now have real time reporting of errpt via email.

Other useful commands for working with the ODM in this case.

# To check the ODM for error reporting records:

odmget -q"en_name LIKE '*_err'" errnotify

# To backup the current contents of the errnotify ODM:

odmget errnotify > /tmp/errnotify.backup

# To remove these customizations (mail_err & syslog_err)

odmdelete -o errnotify -q"en_name LIKE '*_err'"

Working with Snap files

I frequently work with customer systems where I need a systems inventory. This could be for troubleshooting or just to save the final state of a system for later reference.

I have worked with many consultants who have an inventory script they give customers but I have found that I prefer to use the tools native to the platform when they are available. On AIX I use IBM's native snap command. If you've ever been on the phone with IBM support before, you know they barely wait to ask your name before they ask for you to upload a snap.

IBM has created an excellent tool for troubleshooting AIX in the snap utility which is distributed as part of the OS. In my experience it captures about 90% of what I need to know about a system, including:

  • Installed software
  • Devices and attributes
  • LVM details and disk layout
  • Network statistics and configuration data

Rather than ask a customer to run commands for me and capture the output, or ask them to run a script from an untrusted source as root on their production server, I always ask for a snap.

Read more…

Welcome to the new ASC site!

I'm proud to announce that Adams Systems Consultancy is now established in The Netherlands and open for business!

This new company allows me to provide services to customers throughout Europe from a central location.

With this site I'm testing a new blog format to allow me to post technical articles and tips. Stay tuned for posts as I migrate content from my notes into useful articles!