Properly identifying SAN LUNs
I frequently work in large SAN environments, and I always want to verify the identity of any SAN disks (LUNs) which I receive before I write data to them. The rule is trust but verify, as it is disastrous to accidentally overwrite critical data in a shared storage environment.
SAN LUNs are virtual disks presented from a storage appliance to our AIX systems. The storage appliance is responsible for data integrity (ie: scrubbing), redundancy (ie: RAID), and often replication to backup systems or offsite to DR. It's preferred to use a centralized storage solution over each system maintaining a RAID array and related features.
This guide assumes new LUNs have been presented to an AIX system, and the administrator wants to confirm the LUNs before writing data to them. Once properly identified LUNs have been confirmed, they can be used with confidence.
Characteristics of a LUN
LUNs should be identified by:
- Device name
-
Dynamically generated in order of discovery (ie:
hdisk12
), and subject to change if the device table is cleared and re-scanned. - Storage array
-
WWN of the storage array providing the LUN, if there is more than one (ie:
0x5005082810000bdb
). - LUN Serial Number or Unique ID
-
The serial number is typically a long hexadecimal string, and visible in AIX and on the Storage Server UI (ie:
332136005046810810023C200000000000004
). - Size
-
The expected size of the LUN, generally requested at allocation time.
- Paths
-
Each HBA connected to the SAN can contribute one or more paths from the AIX host to the storage array. Generally there are 4 or more paths for redundancy, depending on the type of array.
- PVID
-
The AIX LVM 16 digit hexadecimal Physical Volume ID from
lspv
, as any AIX disk belonging to a volume group will have a PVID. These should be unique across systems with few exceptions.
Device Names
SAN LUNs in AIX are classified as disk devices and typically listed as
an hdiskX
device in the device table (ie: from lsdev -Cc disk
).
Some older device drivers (ie: IBM's original SDD, EMC's PowerPath)
had alternate device names like hdiskpowerX
or vpathX
, but
most systems use AIX's native MPIO now.
% lsdev -Cc disk hdisk0 Available 01-00-00 SAS RAID 0 Disk Array hdisk1 Available 01-00-00 SAS RAID 0 Disk Array hdisk2 Available 03-01-01 MPIO IBM 2076 FC Disk hdisk3 Available 03-01-01 MPIO IBM 2076 FC Disk
That doesn't tell us much, as AIX hdisk
device names are just
numbered in the order discovered. We do know that we have some local
SAS disks, as well as fiber channel (FC) LUNs.
It's common during troubleshooting a SAN to rmdev -dRl fcsX
and
purge all the SAN attached HBAs (fcsX), paths, and LUNs. Next they are
re-scanned using cfgmgr
, and confirmed. This clears counters and
dead paths, however the hdisk and fcs device numbers may change! This
depends on the device discovery order and if any new hardware has been
added.
Do not rely on the device names in backup scripts or other automated processes. They can change over the lifetime of a system.
Storage Array
It's important to know what storage array to expect LUNs to be served from when there is more than one array in the environment.
IBM arrays are listed by major model number (4 digits) in the device description. Other manufacturers may vary.
% lsdev -Cc disk ... hdisk2 Available 03-01-01 MPIO IBM 2076 FC Disk hdisk3 Available 03-01-01 MPIO IBM 2076 FC Disk
These LUNs are from an IBM 2076 V7000 storage array.
The best attribute to verify is the storage array WWN, and the WWPNs of the controller ports.
AIX's MPIO shows this under the lsattr -El hdiskX
attributes, as
node_name
and ww_name
.
% lsattr -El hdisk3 PCM PCM/friend/fcpother Path Control Module False PR_key_value none Persistant Reserve Key Value True+ algorithm shortest_queue Algorithm True+ ... node_name 0x5005079230000128 FC Node Name False ... ww_name 0x5005079230150128 FC World Wide Name False
If provided with specific field names, lspath
can also show the
WWPNs of the storage array:
% lspath -F status:name:parent:connection Enabled:hdisk1:fscsi0:5005079230132bc3,1000000000000 Enabled:hdisk2:fscsi0:5005079230132bc3,2000000000000 Enabled:hdisk3:fscsi0:5005079230132bc3,3000000000000 Enabled:hdisk4:fscsi0:5005079230132bc3,4000000000000 ...
LUN Serial Number
The LUN serial number is the most important identifier of any LUN. Even with duplicate data and PVIDs (ie: a snapshot), the serial numbers will be unique! They are assigned at the storage controller when created, and are guaranteed to be unique across all LUNs from that controller.
The LUN serial number is the only trustworthy method of LUN identification.
AIX's MPIO stores the serial number as part of the unique_id
attribute of lsattr -El hdiskX
:
% lsattr -El hdisk1 PCM PCM/friend/fcpother Path Control Module False PR_key_value none Persistant Reserve Key Value True+ ... unique_id 33213600507293081006E280000000000000204214503IBMfcp Unique device identifier False ww_name 0x5005072930150ba8 FC World Wide Name False
The serial number is also in the output of lspv -u
:
% lspv -u hdisk1 00cdb8105a9b1234 rootvg active 2C020220343123454ec010ad002ad84100000001nvme hdisk3 none None 332136005066810820069E00000000000025304214503IBMfcp f4993d47-0379-7984-98ac-1ccc6f034ac0 hdisk4 none None 332136005066810820069E00000000000025404214503IBMfcp 0bd599ab-426e-ebee-f8a0-5d4a80e1e9d8
The example above has an NVMe drive, and two SAN LUNs. These LUNs have a serial number in the fifth column, and an AIX specific UUID in the sixth column. For interoperability, only rely on the fifth column serial number.
If the system is an older version of AIX (ie: 5.3), or using a non-IBM
driver, the lsattr
command may not list the unique id and lspv -u
may not be available.
In that case, you can query the ODM for the value using odmget
and
grep for the value. Note that grep -p
matches paragraphs, so the
first grep matches all attributes for the hdisk, and the second grep
matches the unique id.
% odmget CuAt | grep -p hdisk10 | grep -p uniq CuAt: name = "hdisk10" attribute = "unique_id" value = "332136005012340810062A20000000000000404214503IBMfcp" type = "R" generic = "D" rep = "nl" nls_index = 79
AIX often adds prefix and suffix information to the LUN depending on the manufacturer and connection type. The prefix is often the WWPN of the storage server, while the suffix will be related to the connection type. Generally the most significant digits of the serial number will be in the middle of the id.
For example:
332136005066810820069E00000000000025404214503IBMfcp ^^^^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^ Server WWPN/WWNN LUN ID IBM suffix
The storage controller or server often has many ports, so this number may vary. The IBM suffix is fairly static for a given storage type.
The LUN ID of "0254" is the most significant digits, and should match with what the storage administrator sees on their console.
When requesting new LUNs, always have the storage administrator return
a list of LUN serial numbers to confirm in lspv -u
after scanning
them with cfgmgr
.
LUN Size
The LUN size is another good data point to help identify a LUN. It can be hard to find before adding the LUN to a volume group.
Some LUNs will have a size_in_mb
attribute in lsattr -El hdiskX
,
however it's not consistent across device types:
% lsattr -El hdisk0 PCM PCM/friend/sisarray Path Control Module False algorithm fail_over Algorithm False ... size_in_mb 571292 Size in Megabytes False
Using getconf
is consistently available across device types and
supported:
% getconf DISK_SIZE /dev/hdiskX 571292
Previously the bootinfo
command was often used to find disk sizes,
but that was deprecated by IBM in 2018 [1].
Some proprietary drivers present the size information with other LUN details, but these are the supported AIX commands.
Paths
It's important to confirm the number of paths to a LUN meets the expectations in the environment. At minimum two paths are required for redundancy. Additional paths add up quickly connecting to multiple storage controllers and across switches.
Note that on AIX a path is from one client WWPN to one server WWPN. There can be multiple paths traversing the same HBA port, physical cable, and SAN switch port.
Many drivers will warn that path operations degrade after a large number of paths. Generally I recommend 16 paths or less per LUN. The optimum number of paths varies by storage vendor and driver.
To check paths, use AIX's lspath
:
% lspath Enabled hdisk0 fscsi0 Enabled hdisk0 fscsi1 ...
Paths are displayed with their state, the hdisk device, and the parent adapter. Path states can include:
- Enabled
-
This path is up and functional.
- Failed
-
This path is down, and no IO is passing. It could recover.
- Missing
-
A missing path was previously known, and was in a
Failed
state when a reboot occurred. At startup it was markedMissing
. Missing paths do not automatically recover, and must be cleared or re-detected withcfgmgr
.
PVID
Finally an AIX unique Physical Volume Identifier (PVID) is good for
identifying LUNs. We can rely on lspv -u
to show all the PVIDs:
% lspv -u hdisk1 00cdb8105a9b1234 rootvg active 2C020220343123454ec010ad002ad84100000001nvme hdisk3 none None 332136005066810820069E00000000000025304214503IBMfcp f4993d47-0379-7984-98ac-1ccc6f034ac0 hdisk4 none None 332136005066810820069E00000000000025404214503IBMfcp 0bd599ab-426e-ebee-f8a0-5d4a80e1e9d8
New LUNs should always have no PVID. Full stop!
If the LUN has a PVID, it may be in use on another system. Verify with the SAN administrator using the LUN serial number that the LUN is not mapped to any other system.
There are valid cases where a LUN may already have a PVID, like a SAN snapshot, HA cluster, or VIO server. However this document focuses on confirming new LUNs for a system.
Writing to a LUN with an existing PVID will cause data loss. Go back and double-check the configuration.
Summary of recommendations
Always ask for a list of LUN serial numbers when making a new LUN request of SAN administrators.
Never rely on device names (ie:
hdisk12
) in scripts or documentation. Use the serial number instead.Always confirm new hdisk devices for SAN LUNs match the serial numbers expected before writing any data to them (ie:
extendvg
).New LUNs always have no PVID, or there is a problem.