Re: multipath-tools support for SGI TP9700...

Kevin M Lange <kevin_m_lange@xxxxxxxxxxxx> · Thu, 26 Jun 2008 00:49:18 -0400

Christophe,

Unfortunately it does
not appear that the TP9700 is working using the multipath device settings
you provided.  

Our configuration is such
where the host (a Sun X4600 running RHEL 5.2) is connected to the TP9700
using two Fibrechannel connections:

No FC switches are used,
just simple direct HBA to SP connectivity with two HBAs and two Storage
Processors.  LUNs on the RAID are distributed to be owned by either
SPA or SPB to distribute the workload between the SPs and the fibrechannel
connections.

The TP9700 can be configured
to present the storage to a host by setting the "Storage Array Host
Type" (Linux, SGIRDAC, SGIAVT, Windows, etc).  For my tests,
I've been experimenting with Linux and SGIRDAC.  I have been unsuccessful
in determining what the storage array host type "Linux"s failover
method is, but I thought I had come across an article that said the Linux
type is basic AVT.  I could be mistaken.

Setting the TP9700 Host
Type to "Linux" , I then setup /etc/multipath.conf to mimic the
defaults for the TP9500:

       device {

   vendor                
 "SGI"

   product                
"TP9[457]00"

   getuid_callout          "/sbin/scsi_id
-g -u -s /block/%n"

   prio_callout            "/sbin/mpath_prio_tpc
/dev/%n"

   features              
 "0"

   hardware_handler        "0"

   path_grouping_policy    group_by_prio

   failback              
 immediate

   rr_weight              
uniform

   rr_min_io              
1000

   path_checker            tur

       }

This configuration ran
OK for a while, then began to log multipath failures, and eventually I/O
buffer errors.  All LUNs on one SP trespassed to the other SP, and
I had to manually place each trespassed LUN back to its primary path.

Changing the TP9700 host type
to SGIRDAC, then trying the configuration you provided me caused the host
to not see the ghost path.  Effectively I ended up with a single path.
 Disconnecting a FC connection resulted in the inability to see any
of the LUNs assigned to the associated SP.

I modified the multipath.conf
a little:

      device {

    vendor "SGI"

    product "TP9700"

    path_grouping_policy failover

    getuid_callout          "/sbin/scsi_id
-g -u -s /block/%n"

    features "1 queue_if_no_path"

    path_checker rdac

    prio_callout            "/sbin/mpath_prio_tpc
/dev/%n"

    hardware_handler "1 rdac"

    prio rdac

    failback immediate

        }

This worked ok, but I see lots of scsi
sense key errors:

Jun 23 12:16:42 p4dbl03 kernel: sdbk:
Current: sense key: Recovered Error

Jun 23 12:16:42 p4dbl03 kernel:  
  <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1

Jun 23 12:16:42 p4dbl03 kernel:

I see those error regardless of how
I configured the RAID and multipath.conf, which is worrisome.

I especially see those errors if I run
'fdisk -l'.  

Disconnecting a FC cable on one HBA
caused the associated volumes to trespass to the other SP, however, during
this process, I noticed buffer I/O errors.  Also, I noticed that the
trespassed LUNs did not failback to their original SP when the FC cable
was reconnected.  Am I to assume that RDAC or other multipath software
will not tell the storage to failback trepassed LUNs?

Your assistance is appreciated,

- Kevin

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel