SAN LUNs in SMS
I previously discussed how important it is to verify LUN IDs before writing over them in AIX. What about before AIX is booted in SMS? How can you verify your LUNs in SMS?
Overview of Boot from SAN
It's common in larger AIX environments to boot directly from SAN storage rather than internal drives. SANs provide reliable fast disks for applications, so why not apply that to your OS as well?
Internal drives used to boot AIX are typically "Just a Bunch Of Drives" (JBOD), and the AIX LVM is used to mirror the rootvg across them for redundancy. Increasing virtualization and competition for resources between LPARs (ie: internal bays need split backplanes) has made booting from SAN more common. Features like SAN mirroring and snapshots provide more incentive to do so.
Boot from SAN recommendations
In the past booting AIX from SAN was complicated, but in recent years AIX's built-in MPIO drivers have made it a much smoother experience.
Recent firmware updates in POWER9 firmware 950 [1] and higher have made
working with SAN LUNs in SMS easier as well! The ioinfo
command at
the open firmware prompt was also removed [2] and replaced by this
new functionality.
Before moving to booting from SAN, please consider some of the following items.
Redundancy
When a system boots from SAN, it can no longer recover as easily from a SAN outage as a system with local boot drives and application data on the SAN. Now a SAN interruption could mean a hard crash instead of a few minutes of hung IO.
Systems intending to boot from SAN should have redundant HBA cards, not just redundant HBA ports. Those cards should link to two fabrics for redundancy. This may mean using only the top port in an HBA card, and connecting multiple adapters.
Single Points Of Failure (SPOFs) should be avoided at all costs. Review your HBAs, cables, and switch connections to eliminate any SPOFs.
Virtualized LPARs should use dual redundant PowerVM VIO servers, and be mindful of the HBA card topology to ensure they are on two fabrics.
Third party drivers
AIX's boot from SAN is still best on IBM storage with MPIO. Third party storage which use drivers other than MPIO should be approached with caution.
One option when third party drivers are in use is to rely on locally booted PowerVM VIO servers and VSCSI. The VIO server has the third party driver installed, the rootvg LUNs for all clients are mapped to the VIO servers, and then those LUNs are mapped to client LPARs using VSCSI.
The VIO clients benefit from the SAN storage and they use built-in VSCSI drivers during boot. The third party driver is on VIO where it can be administered. The client LPARs can then use the third party driver with NPIV adapters to access SAN storage for applications.
Virtualization
In an environment with PowerVM VIO services, booting clients using NPIV adapters from SAN LUNs minimizes the VIO administration and complexity. This is an ideal configuration.
Consider whether the VIO servers should boot locally from internal drives. In the event of a SAN issue, it's useful to have VIO servers online to perform troubleshooting. That's not an option without local boot media.
An excellent compromise is to architect systems with a split backplane and four local disks which can host two redundant VIO servers. Each boots locally with mirrored disks. Then all client LPARs can boot directly from SAN using NPIV.
PowerHA Clusters
Special attention must be paid to PowerHA clusters which boot from SAN. In a PowerHA cluster the rootvg is a critical VG and monitored by the cluster watchdog. If an IO delay due to SAN issues prevents IO to the rootvg for a full minute, the cluster nodes will deliberately crash to trigger a failover to a surviving node.
Cross site mirroring can cause problems here if SAN storage blocks IO during a inter-site communication problem.
As a result, I recommend keeping PowerHA clusters booting from local drives where complex SAN configurations are present. Either they can use local drive directly, a combination of VIO VSCSI mappings, or one local and one SAN drive mirrored in rootvg.
One mapping during deployment
If only one LUN is mapped to our system during deployment, that certainly helps narrow down the correct disk. I recommend for a new system being deployed to only map the rootvg LUN. Other LUNs may be mapped after AIX is installed.
Snapshots and rootvg
SAN storage often has the ability to take an instant point in time snapshot of LUNs, for backup or restore later. This can also be a tool used for AIX upgrades.
IBM has published best practices for snapshots including the use of
the chfs
freeze command to pause IO to filesystems and flush their
buffers. Without applying these best practices, a "dirty" snapshot may
be taken which could have corruption or data inconsistencies.
Unlike data volume groups, you should not freeze the rootvg during operation. The best way to take a SAN snapshot of the rootvg is with the LPAR shutdown.
Alternatively I recommend adding a second bootable LUN and using the
alt_disk_copy
command instead to create a bootable snapshot.
LUN Characteristics
It's a common best practice to keep the rootvg as small as possible to
minimize the space required for taking image backups via
mksysb
. Rootvg should also have a relatively light IO workload
compared to application volume groups.
As a result, I often will request rootvg LUNs which are provisioned as thin allocated at a modest size (50G) with optional compression on the SAN backend. These LUNs can also be on a slower storage type.
Thin allocation means that the SAN will only allocate storage to the LUN when it is written to. This is a useful fiction to allow over-committing on the SAN array. The rootvg should stay small, making it an excellent candidate for thin allocation.
Compression or slower economical backend storage is useful to minimize the rootvg footprint, as the load is not demanding and latency is less of an issue.
Installing AIX on SAN LUNs
Unfortunately the AIX installer and the bootable mksysb
restore
programs show very little information about disks. They are kept
minimal so they can be booted from CD or network image, and do not
have a full operating system to leverage for storage information.
Lacking an OS also makes discovering our port status and WWPNs more difficult. Without an OS keeping the ports online, many of the switch and storage GUI configuration tools won't see our system to allow the SAN administrator to configure resources. The recent improvements in SMS can help solve those problems.
To prepare a system for zoning, mapping, and boot from SAN start by activating the LPAR into SMS. The following instructions assume the LPAR is already at the SMS menu and that the system console is available.
Devices in SMS
The 950 firmware replaced the third option on the main SMS menu with a new item, "I/O Device Information". Many device types are available, however this document focuses on "SAN" and "FCP" ports in the sub menus.
Broadcast to assist in zoning
Testing SAN cables for connectivity is always worthwhile. Bringing the link up can also advertise our WWPNs to the switch and storage.
This is a great time to have the SAN administrator open any GUI tool they use with the switch and storage so they can confirm the ports light up briefly, and WWPN entry widgets populate with WWPNs.
After navigating to the SAN
option, then the FCP
option, a list of
adapters is presented:
Select Media Adapter 1. U7888.ND1.CABC123-P1-C3-T1 /pci@800000020000072/fibre-channel@0 2. U7888.ND1.CABC123-P1-C6-T1 /pci@800000020000099/fibre-channel@0
When an adapter is selected, the port will be brought online temporarily. The adapter will try to find a link, and if one is found it will try to inventory the LUNs available.
If the port is not connected, a failure message like this appears:
.------------------. | PLEASE WAIT.... | `------------------' Link down Cannot Init Link. .----------------------------. | No SAN devices present | | Press any key to continue | `----------------------------'
That's pretty clear! Check that cable and try again.
Once this succeeds, the SAN zones can be created on the switch to allow our ports to communicate with the SAN storage controllers. The ports communicated with the switch and should now be on record for configuration.
Map LUNs to host
If zones are newly created, go back and execute the test on that adapter again to advertise our port to the SAN storage controller.
If the storage controller is located but no LUNs are found, we're presented with a list of unrecognized devices which can repeat when there are multiple paths to the controller:
Select Attached Device Pathname: /pci@800000020000072/fibre-channel@0 WorldWidePortName: 100000abc1234554 1. 5005012312312319,0 Unrecognized device type: 3f 2. 500501231231231c,0 Unrecognized device type: 3f 3. 500501231231231c,0 Unrecognized device type: 3f 4. 5005012312312319,0 Unrecognized device type: 3f
The system should have a host record created on the storage controller and our rootvg LUN mapped to us. The SAN administrator should also provide the LUN ID.
Detect LUNs
Scan once more, and one or more LUNs should be visible:
Select Attached Device Pathname: /pci@800000020000072/fibre-channel@0 WorldWidePortName: 100000abc1234554 1. 5005012312312319,0 107 GB Disk drive 2. 500501231231231c,1000000000000 107 GB Disk drive
Confirm LUN identity
Selecting a LUN device shows additional details:
SAN Device Menu Target Address: 500501231231231c Lun Address: 0 Pathname: /pci@800000020000072/fibre-channel@0/disk@500501231231231c,0 Device: 107 GB Disk drive BUID: IBM-2145-60050768123123123123000000000222
The BUID
is the LUN serial number!
Before booting to the AIX installer, take note of which mappings (ie: "500501231231231c,0") correspond to which serial number (BUID). The installer only shows the mapping number.
Minimal information in the installer
When booted into the installer, I always choose "Change/Show Installation Settings and Install" from the main menu so I can select the disk and view the disk information:
Name Location Code Size(MB) VG Status Bootable >>> 1 hdisk0 none 102400 none Yes No 2 hdisk1 none 102400 none Yes No >>> 0 Continue with choices indicated above 66 Disks not known to Base Operating System Installation 77 Display More Disk Information
The 77
choice will rotate the columns describing the drives:
Name Physical Volume Identifier >>> 1 hdisk0 0000000000000000 2 hdisk1 0000000000000000
77
choice again can show some additional information which
includes the WWPN of the storage server and the incremental mapping
number:
>>> 1 hdisk0 U78D4.ND1.ABC123K-P1-C10-T1-W500501231231231c-L0 2 hdisk1 U78D4.ND1.ABC123K-P1-C10-T1-W500501231231231c-L1...
Unfortunately there's no serial number!
While it's often safe to write to a disk whose size matches expectations and has no PVID, that serial number is the only way to know.
Comparing the mapping and serial number information from SMS against the installer, you can now be certain of the correct hdisk to use.
View in AIX
Once AIX is installed the lspv -u
command can be used to view the
serial numbers for LUNs and compare to the information from SMS:
% lspv -u hdisk0 00123123ad934732 rootvg active 33213500501231231231C480000000000022204214503IBMfcp 2343caaf-b0aa-cd71-3d29-15adad9qcc28 hdisk1 00123123be048213 altinst_rootvg 33213500501231231231C480000000000022304214503IBMfcp 10b2226b-8224-6323-a642-82134e5e5954
Bonus Feature: Test Network ports
Cable testing is always frustrating before an OS is installed. SMS now includes a way to test the SAN connection as we discussed above. It also allows you to check network ports too!
Under the SMS main menu "I/O Device Information", there is now a listing for "Network ports":
I/O Device Information 1. SAN 2. SAS 3. NVMe 4. vSCSI 5. Firmware device driver Secure Boot validation failures 6. Network ports <===============
This presents a list of network ports and can do a link test on them. It can also test all ports to find which ones have link.
PowerPC Firmware Version FW950.30 (VM950_092) SMS (c) Copyright IBM Corp. 2000,2021 All rights reserved. ------------------------------------------------------------------------------- Network Port connectivity check menu Ports with * are connected to a network and are capable of network activity PCIe2 4-Port (10GbE SFP+ & 1Gb U78D4.ND1.CSS9999-P1-C11-T1 PCIe2 4-Port (10GbE SFP+ & 1Gb U78D4.ND1.CSS9999-P1-C11-T2 PCIe2 4-Port (10GbE SFP+ & 1Gb U78D4.ND1.CSS9999-P1-C11-T3 PCIe2 4-Port (10GbE SFP+ & 1Gb U78D4.ND1.CSS9999-P1-C11-T4 PCIe2 4-Port (10GbE SFP+ & 1Gb U78D4.ND1.CSS9999-P1-C8-T1 PCIe2 4-Port (10GbE SFP+ & 1Gb U78D4.ND1.CSS9999-P1-C8-T2 PCIe2 4-Port (10GbE SFP+ & 1Gb U78D4.ND1.CSS9999-P1-C8-T3 PCIe2 4-Port (10GbE SFP+ & 1Gb U78D4.ND1.CSS9999-P1-C8-T4
These tools are very handy when remotely supporting a physical installation.