Christophe,
Unfortunately it does not appear that the TP9700 is working using the multipath device settings you provided.
Our configuration is such where the host (a Sun X4600 running RHEL 5.2) is connected to the TP9700 using two Fibrechannel connections:
No FC switches are used, just simple direct HBA to SP connectivity with two HBAs and two Storage Processors. LUNs on the RAID are distributed to be owned by either SPA or SPB to distribute the workload between the SPs and the fibrechannel connections.
The TP9700 can be configured to present the storage to a host by setting the "Storage Array Host Type" (Linux, SGIRDAC, SGIAVT, Windows, etc). For my tests, I've been experimenting with Linux and SGIRDAC. I have been unsuccessful in determining what the storage array host type "Linux"s failover method is, but I thought I had come across an article that said the Linux type is basic AVT. I could be mistaken.
Setting the TP9700 Host Type to "Linux" , I then setup /etc/multipath.conf to mimic the defaults for the TP9500:
device {
vendor "SGI"
product "TP9[457]00"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout "/sbin/mpath_prio_tpc /dev/%n"
features "0"
hardware_handler "0"
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
rr_min_io 1000
path_checker tur
}
This configuration ran OK for a while, then began to log multipath failures, and eventually I/O buffer errors. All LUNs on one SP trespassed to the other SP, and I had to manually place each trespassed LUN back to its primary path.
Changing the TP9700 host type to SGIRDAC, then trying the configuration you provided me caused the host to not see the ghost path. Effectively I ended up with a single path. Disconnecting a FC connection resulted in the inability to see any of the LUNs assigned to the associated SP.
I modified the multipath.conf a little:
device {
vendor "SGI"
product "TP9700"
path_grouping_policy failover
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
features "1 queue_if_no_path"
path_checker rdac
prio_callout "/sbin/mpath_prio_tpc /dev/%n"
hardware_handler "1 rdac"
prio rdac
failback immediate
}
This worked ok, but I see lots of scsi sense key errors:
Jun 23 12:16:42 p4dbl03 kernel: sdbk: Current: sense key: Recovered Error
Jun 23 12:16:42 p4dbl03 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Jun 23 12:16:42 p4dbl03 kernel:
I see those error regardless of how I configured the RAID and multipath.conf, which is worrisome.
I especially see those errors if I run 'fdisk -l'.
Disconnecting a FC cable on one HBA caused the associated volumes to trespass to the other SP, however, during this process, I noticed buffer I/O errors. Also, I noticed that the trespassed LUNs did not failback to their original SP when the FC cable was reconnected. Am I to assume that RDAC or other multipath software will not tell the storage to failback trepassed LUNs?
Your assistance is appreciated,
- Kevin
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel