Re: rdac path failure - Sun 6140

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I will test this today.

Does everything look OK from a configuration standpoint?  Should the
RDAC virtual HBA drivers from LSI be a requirement?  I am not
currently using them.

Thank you,
--
Stew




On Thu, Aug 13, 2009 at 5:05 PM, Moger, Babu<Babu.Moger@xxxxxxx> wrote:
> Stew,
>
>      I don’t see much information about this failure in the logs. Right now
> device handlers don’t provide much information on failures.  We are working
> on to add some more debug levels.  I am attaching my draft code
> (scsi_dh_rdac.c) here.  Please use this only for your testing. It is not
> been approved/reviewed yet. I still need to submit this one to community for
> approval. The code is attached.  Please replace this file with
> scsi_dh_rdac.c in the directory /driver/scsi/device_handlers and rebuild the
> kernel.  This should give more information from the target point of view.
> Please send me the /var/log/messages file after the failure. Let see if we
> can get more information..
>
>
>
> Thanks
>
> Babu Moger
>
> ________________________________
>
> From: dm-devel-bounces@xxxxxxxxxx [mailto:dm-devel-bounces@xxxxxxxxxx] On
> Behalf Of Stewart Smith
> Sent: Thursday, August 13, 2009 3:35 PM
> To: device-mapper development
> Subject: Re:  rdac path failure - Sun 6140
>
>
>
>
>
> Same sequence of events, with multipathd -v3
>
>
>
> Aug 13 16:28:48.627  kernel: device-mapper: multipath: Failing path 8:208.
>
> Aug 13 16:28:48.000  multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:28:48.000  multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:28:48.000  multipathd: pg_timeout = NONE (internal default)
>
> Aug 13 16:28:48.000  multipathd: 8:208: mark as failed
>
> Aug 13 16:28:48.000  multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:28:48.000  multipathd: UDEV_LOG=3
>
> Aug 13 16:28:48.000  multipathd: ACTION=change
>
> Aug 13 16:28:48.000  multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:28:48.000  multipathd: SUBSYSTEM=block
>
> Aug 13 16:28:48.000  multipathd: DM_TARGET=multipath
>
> Aug 13 16:28:48.000  multipathd: DM_ACTION=PATH_FAILED
>
> Aug 13 16:28:48.000  multipathd: DM_SEQNUM=1
>
> Aug 13 16:28:48.000  multipathd: DM_PATH=8:208
>
> Aug 13 16:28:48.000  multipathd: DM_NR_VALID_PATHS=3
>
> Aug 13 16:28:48.000  multipathd: DM_NAME=vol1
>
> Aug 13 16:28:48.000  multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:28:48.000  multipathd: MAJOR=253
>
> Aug 13 16:28:48.000  multipathd: MINOR=1
>
> Aug 13 16:28:48.000  multipathd: DEVTYPE=disk
>
> Aug 13 16:28:48.000  multipathd: SEQNUM=1738
>
> Aug 13 16:28:48.000  multipathd: UDEVD_EVENT=1
>
> Aug 13 16:28:48.000  multipathd: DEVNAME=/dev/dm-1
>
> Aug 13 16:28:50.000  multipathd: 8:208: reinstated
>
> Aug 13 16:28:50.000  multipathd: vol1: remaining active paths: 4
>
> Aug 13 16:28:50.000  multipathd: sdj: rdac prio = 3
>
> Aug 13 16:28:50.000  multipathd: sdn: rdac prio = 3
>
> Aug 13 16:28:50.000  multipathd: sdb: rdac prio = 0
>
> Aug 13 16:28:50.000  multipathd: sdd: rdac prio = 0
>
> Aug 13 16:28:50.763  kernel: device-mapper: multipath: Failing path 8:208.
>
> Aug 13 16:28:50.000  multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:28:50.000  multipathd: UDEV_LOG=3
>
> Aug 13 16:28:50.000  multipathd: ACTION=change
>
> Aug 13 16:28:50.000  multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:28:50.000  multipathd: SUBSYSTEM=block
>
> Aug 13 16:28:50.000  multipathd: DM_TARGET=multipath
>
> Aug 13 16:28:50.000  multipathd: DM_ACTION=PATH_REINSTATED
>
> Aug 13 16:28:50.000  multipathd: DM_SEQNUM=2
>
> Aug 13 16:28:50.000  multipathd: DM_PATH=8:208
>
> Aug 13 16:28:50.000  multipathd: DM_NR_VALID_PATHS=4
>
> Aug 13 16:28:50.000  multipathd: DM_NAME=vol1
>
> Aug 13 16:28:50.000  multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:28:50.000  multipathd: MAJOR=253
>
> Aug 13 16:28:50.000  multipathd: MINOR=1
>
> Aug 13 16:28:50.000  multipathd: DEVTYPE=diskAug 13 16:28:50.000
>  multipathd: SEQNUM=1739Aug 13 16:28:50.000  multipathd: UDEVD_EVENT=1
>
> Aug 13 16:28:50.000  multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:28:50.000  multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:28:50.000  multipathd: pg_timeout = NONE (internal default)
>
> Aug 13 16:28:50.000  multipathd: 8:208: mark as failed
>
> Aug 13 16:28:50.000  multipathd: vol1: remaining active paths: 3
>
> Aug 13 16:28:50.000  multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:28:50.000  multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:28:50.000  multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:28:50.000  multipathd: UDEV_LOG=3
>
> Aug 13 16:28:50.000  multipathd: ACTION=change
>
> Aug 13 16:28:50.000  multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:28:50.000  multipathd: SUBSYSTEM=block
>
> Aug 13 16:28:50.000  multipathd: DM_TARGET=multipath
>
> Aug 13 16:28:50.000  multipathd: DM_ACTION=PATH_FAILED
>
> Aug 13 16:28:50.000  multipathd: DM_SEQNUM=3
>
> Aug 13 16:28:50.000  multipathd: DM_PATH=8:208
>
> Aug 13 16:28:50.000  multipathd: DM_NR_VALID_PATHS=3
>
> Aug 13 16:28:50.000  multipathd: DM_NAME=vol1
>
> Aug 13 16:28:50.000  multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:28:50.000  multipathd: MAJOR=253
>
> Aug 13 16:28:50.000  multipathd: MINOR=1
>
> Aug 13 16:28:50.000  multipathd: DEVTYPE=disk
>
> Aug 13 16:28:50.000  multipathd: SEQNUM=1740
>
> Aug 13 16:28:50.000  multipathd: UDEVD_EVENT=1
>
> Aug 13 16:28:50.000  multipathd: DEVNAME=/dev/dm-1
>
> Aug 13 16:29:00.000  multipathd: 8:208: reinstated
>
> Aug 13 16:29:00.000  multipathd: vol1: remaining active paths: 4
>
> Aug 13 16:29:00.000  multipathd: sdj: rdac prio = 3
>
> Aug 13 16:29:00.000  multipathd: sdn: rdac prio = 3
>
> Aug 13 16:29:00.000  multipathd: sdb: rdac prio = 0
>
> Aug 13 16:29:00.000  multipathd: sdd: rdac prio = 0
>
> Aug 13 16:29:00.000  multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:29:00.000  multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:29:00.000  multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:29:00.000  multipathd: UDEV_LOG=3
>
> Aug 13 16:29:00.000  multipathd: ACTION=change
>
> Aug 13 16:29:00.000  multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:29:00.000  multipathd: SUBSYSTEM=block
>
> Aug 13 16:29:00.000  multipathd: DM_TARGET=multipath
>
> Aug 13 16:29:00.000  multipathd: DM_ACTION=PATH_REINSTATED
>
> Aug 13 16:29:00.000  multipathd: DM_SEQNUM=4
>
> Aug 13 16:29:00.000  multipathd: DM_PATH=8:208
>
> Aug 13 16:29:00.000  multipathd: DM_NR_VALID_PATHS=4
>
> Aug 13 16:29:00.000  multipathd: DM_NAME=vol1
>
> Aug 13 16:29:00.000  multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:29:00.000  multipathd: MAJOR=253
>
> Aug 13 16:29:00.000  multipathd: MINOR=1
>
> Aug 13 16:29:00.000  multipathd: DEVTYPE=disk
>
> Aug 13 16:29:00.000  multipathd: SEQNUM=1741
>
> Aug 13 16:29:00.000  multipathd: UDEVD_EVENT=1
>
> Aug 13 16:29:00.000  multipathd: DEVNAME=/dev/dm-1
>
> Aug 13 16:29:02.753  kernel: device-mapper: multipath: Failing path 8:208.
>
> Aug 13 16:29:02.000  multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:29:02.000  multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:29:02.000  multipathd: pg_timeout = NONE (internal default)
>
> Aug 13 16:29:02.000  multipathd: 8:208: mark as failed
>
> Aug 13 16:29:02.000  multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:29:02.000  multipathd: UDEV_LOG=3
>
> Aug 13 16:29:02.000  multipathd: ACTION=change
>
> Aug 13 16:29:02.000  multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:29:02.000  multipathd: SUBSYSTEM=block
>
> Aug 13 16:29:02.000  multipathd: DM_TARGET=multipath
>
> Aug 13 16:29:02.000  multipathd: DM_ACTION=PATH_FAILED
>
> Aug 13 16:29:02.000  multipathd: DM_SEQNUM=5
>
> Aug 13 16:29:02.000  multipathd: DM_PATH=8:208
>
> Aug 13 16:29:02.000  multipathd: DM_NR_VALID_PATHS=3
>
> Aug 13 16:29:02.000  multipathd: DM_NAME=vol1
>
> Aug 13 16:29:02.000  multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:29:02.000  multipathd: MAJOR=253
>
> Aug 13 16:29:02.000  multipathd: MINOR=1
>
> Aug 13 16:29:02.000  multipathd: DEVTYPE=disk
>
> Aug 13 16:29:02.000  multipathd: SEQNUM=1742
>
> Aug 13 16:29:02.000  multipathd: UDEVD_EVENT=1
>
> Aug 13 16:29:02.000  multipathd: DEVNAME=/dev/dm-1
>
> Aug 13 16:29:10.000  multipathd: 8:208: reinstated
>
> Aug 13 16:29:10.000  multipathd: vol1: remaining active paths: 4
>
> Aug 13 16:29:10.000  multipathd: sdj: rdac prio = 3
>
> Aug 13 16:29:10.000  multipathd: sdn: rdac prio = 3
>
> Aug 13 16:29:10.000  multipathd: sdb: rdac prio = 0
>
> Aug 13 16:29:10.000  multipathd: sdd: rdac prio = 0
>
> Aug 13 16:29:10.000  multipathd: vol1: rr_weight = 2 (LUN setting)
>
> Aug 13 16:29:10.000  multipathd: vol1: pgfailback = -2 (controller setting)
>
> Aug 13 16:29:10.000  multipathd: uevent 'change' from
> '/devices/virtual/block/dm-1'
>
> Aug 13 16:29:10.000  multipathd: UDEV_LOG=3
>
> Aug 13 16:29:10.000  multipathd: ACTION=change
>
> Aug 13 16:29:10.000  multipathd: DEVPATH=/devices/virtual/block/dm-1
>
> Aug 13 16:29:10.000  multipathd: SUBSYSTEM=block
>
> Aug 13 16:29:10.000  multipathd: DM_TARGET=multipath
>
> Aug 13 16:29:10.000  multipathd: DM_ACTION=PATH_REINSTATED
>
> Aug 13 16:29:10.000  multipathd: DM_SEQNUM=6
>
> Aug 13 16:29:10.000  multipathd: DM_PATH=8:208
>
> Aug 13 16:29:10.000  multipathd: DM_NR_VALID_PATHS=4
>
> Aug 13 16:29:10.000  multipathd: DM_NAME=vol1
>
> Aug 13 16:29:10.000  multipathd:
> DM_UUID=mpath-3600a0b800048335200001e5d48b68a9b
>
> Aug 13 16:29:10.000  multipathd: MAJOR=253
>
> Aug 13 16:29:10.000  multipathd: MINOR=1
>
> Aug 13 16:29:10.000  multipathd: DEVTYPE=disk
>
> Aug 13 16:29:10.000  multipathd: SEQNUM=1743
>
> Aug 13 16:29:10.000  multipathd: UDEVD_EVENT=1
>
> Aug 13 16:29:10.000  multipathd: DEVNAME=/dev/dm-1
>
>
>
>
>
>
>
>
>
> On Thu, Aug 13, 2009 at 1:27 PM, Stewart Smith <stew@xxxxxxxxxxxx> wrote:
>
>
>
> after a fresh, multipath -F and start of multipathd with -v 2 I see the
> following messages.
>
>
>
> After starting multipathd I mounted /dev/mapper/vol1 and generated some
> simple I/O to it using dd
>
>
>
>
>
> Aug 13 16:23:14.888 localhost kernel: device-mapper: multipath: Failing path
> 8:208.
>
> Aug 13 16:23:14.000 localhost multipathd: 8:208: mark as failed
>
> Aug 13 16:23:16.000 localhost multipathd: 8:208: reinstated
>
> Aug 13 16:23:30.462 localhost kernel: device-mapper: multipath: Failing path
> 8:208.
>
> Aug 13 16:23:30.000 localhost multipathd: 8:208: mark as failed
>
> Aug 13 16:23:39.000 localhost multipathd: 8:208: reinstated
>
> Aug 13 16:23:46.430 localhost kernel: device-mapper: multipath: Failing path
> 8:208.
>
> Aug 13 16:23:46.000 localhost multipathd: 8:208: mark as failed
>
> Aug 13 16:23:51.041 localhost kernel: device-mapper: multipath: Failing path
> 8:208.
>
> Aug 13 16:23:51.000 localhost multipathd: 8:208: mark as failed
>
> Aug 13 16:23:59.000 localhost multipathd: 8:208: reinstated
>
> Aug 13 16:24:06.465 localhost kernel: device-mapper: multipath: Failing path
> 8:208.
>
> Aug 13 16:24:06.000 localhost multipathd: 8:208: mark as failed
>
> Aug 13 16:24:09.000 localhost multipathd: 8:208: reinstated
>
>
>
>
>
> Thanks,
>
> --
>
> Stew
>
>
>
>
>
>
>
> On Thu, Aug 13, 2009 at 12:42 PM, Moger, Babu <Babu.Moger@xxxxxxx> wrote:
>
> Do you have /var/log/messages file for this problem?
>
> Thanks
> Babu Moger
>
>> -----Original Message-----
>> From: dm-devel-bounces@xxxxxxxxxx [mailto:dm-devel-bounces@xxxxxxxxxx] On
>> Behalf Of Stewart Smith
>> Sent: Thursday, August 13, 2009 1:51 PM
>> To: dm-devel@xxxxxxxxxx
>> Subject:  rdac path failure - Sun 6140
>>
>> Hello All,
>>
>> I am seeing many of these messages when my Sun 6140 array is under heavy
>> I/O
>> device-mapper: multipath: Failing path 8:208.
>> device-mapper: multipath: Failing path 8:208.
>> device-mapper: multipath: Failing path 8:208.
>> device-mapper: multipath: Failing path 8:208.
>> device-mapper: multipath: Failing path 8:208.
>>
>>
>> I am running a Fedora 10 server, with two fiber connections to two
>> different switches.  Both controllers on the 6140 have one connection
>> to each switch as well.  The end result is that I see four paths to
>> each LUN.
>>
>> When the volume is mounted and under significant load I see the
>> messages above every few seconds.  They seem to appear every
>> "no_path_retry" seconds.
>>
>> The 6140 controller firmware is up to date at version 07.50.08.10 and
>> I have installed the latest firmware for my Emulex LPe11002 cards.  I
>> have reproduced the problem using both Cisco MDS and Brocade fiber
>> channel switches as well.
>>
>> Using CAM, I have set the initiator Host Type to "Linux" at the
>> moment.  I have tried other options as well without success.
>>
>> I have NOT installed the RDAC drivers from either Sun or LSI -
>> primarily because they do not seem to build on my Fedora 10 kernel.
>>
>> Any ideas would be greatly appreciated!!!
>>
>> configs and debugging multipathd output is below.
>>
>>
>>
>>
>>
>> Kernel: 2.6.27.24-170.2.68.fc10.x86_64
>>
>> # multipath -lll
>> vol1 (3600a0b800048335200001e5d48b68a9b) dm-1 SUN,CSM200_R
>> [size=12T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
>> \_ round-robin 0 [prio=6][active]
>>  \_ 5:0:1:2 sdj 8:144 [active][ready]
>>  \_ 2:0:1:2 sdn 8:208 [active][ready]
>> \_ round-robin 0 [prio=0][enabled]
>>  \_ 2:0:0:2 sdb 8:16  [active][ghost]
>>  \_ 5:0:0:2 sdd 8:48  [active][ghost]
>>
>>
>> # cat /etc/multipath.conf
>>
>> blacklist {
>>         devnode "^sd[a-z][[0-9]*]"
>>         devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
>>         devnode "^hd[a-z][0-9]*"
>>         devnode "^cciss!c[0-9]d[0-9](p[0-9]*)*"
>> }
>>
>> defaults {
>>         udev_dir                /dev
>>         polling_interval        10
>>         selector                "round-robin 0"
>>         path_grouping_policy    multibus
>>         getuid_callout          "/sbin/scsi_id --whitelisted /dev/%n"
>>         prio                    alua
>>         path_checker            readsector0
>>         rr_min_io               100
>>         max_fds                 8192
>>         rr_weight               priorities
>>         failback                immediate
>>         no_path_retry           fail
>>         user_friendly_names     yes
>> }
>> devices {
>>         device {
>>                 vendor                  "SUN"
>>                 product                 "CSM200_R"
>>                 product_blacklist       "Universal Xport"
>>                 getuid_callout          "/sbin/scsi_id --whitelisted
>> /dev/%n"
>>                 features                "0"
>>                 hardware_handler        "1 rdac"
>>                 path_selector           "round-robin 0"
>>                 path_grouping_policy    group_by_prio
>>                 failback                immediate
>>                 rr_weight               uniform
>>                 no_path_retry           queue
>>                 rr_min_io               1000
>>                 path_checker            rdac
>>                 prio                    rdac
>>         }
>> }
>>
>> multipaths {
>>         multipath {
>>                 wwid                    3600a0b800048335200001e5d48b68a9b
>>                 alias                   vol1
>>                 rr_weight               priorities
>>                 no_path_retry           5
>>                 rr_min_io               100
>>         }
>> }
>>
>>
>>
>> # multipathd -d v3
>>
>>
>> Aug 13 14:48:53 | sdb: ownership set to vol1
>> Aug 13 14:48:53 | sdb: not found in pathvec
>> Aug 13 14:48:53 | sdb: mask = 0xc
>> Aug 13 14:48:53 | sdb: path checker = rdac (controller setting)
>> Aug 13 14:48:53 | sdb: state = 4
>> Aug 13 14:48:53 | sdb: rdac prio = 0
>> Aug 13 14:48:53 | sdd: ownership set to vol1
>> Aug 13 14:48:53 | sdd: not found in pathvec
>> Aug 13 14:48:53 | sdd: mask = 0xc
>> Aug 13 14:48:53 | sdd: path checker = rdac (controller setting)
>> Aug 13 14:48:53 | sdd: state = 4
>> Aug 13 14:48:53 | sdd: rdac prio = 0
>> Aug 13 14:48:53 | sdj: ownership set to vol1
>> Aug 13 14:48:53 | sdj: not found in pathvec
>> Aug 13 14:48:53 | sdj: mask = 0xc
>> Aug 13 14:48:53 | sdj: path checker = rdac (controller setting)
>> Aug 13 14:48:53 | sdj: state = 2
>> Aug 13 14:48:53 | sdj: rdac prio = 3
>> Aug 13 14:48:53 | sdn: ownership set to vol1
>> Aug 13 14:48:53 | sdn: not found in pathvec
>> Aug 13 14:48:53 | sdn: mask = 0xc
>> Aug 13 14:48:53 | sdn: path checker = rdac (controller setting)
>> Aug 13 14:48:53 | sdn: state = 2
>> Aug 13 14:48:53 | sdn: rdac prio = 3
>> Aug 13 14:48:53 | vol1: pgfailback = -2 (controller setting)
>> Aug 13 14:48:53 | vol1: pgpolicy = group_by_prio (controller setting)
>> Aug 13 14:48:53 | vol1: selector = round-robin 0 (controller setting)
>> Aug 13 14:48:53 | vol1: features = 0 (controller setting)
>> Aug 13 14:48:53 | vol1: hwhandler = 1 rdac (controller setting)
>> Aug 13 14:48:53 | vol1: rr_weight = 2 (LUN setting)
>> Aug 13 14:48:53 | vol1: minio = 100 (LUN setting)
>> Aug 13 14:48:53 | vol1: no_path_retry = 5 (multipath setting)
>> Aug 13 14:48:53 | pg_timeout = NONE (internal default)
>> Aug 13 14:48:53 | vol1: set ACT_CREATE (map does not exist)
>> create: vol1 (3600a0b800048335200001e5d48b68a9b) n/a SUN,CSM200_R
>> [size=12T][features=0][hwhandler=1 rdac][n/a]
>> \_ round-robin 0 [prio=6][undef]
>>  \_ 5:0:1:2 sdj 8:144 [undef][ready]
>>  \_ 2:0:1:2 sdn 8:208 [undef][ready]
>> \_ round-robin 0 [prio=0][undef]
>>  \_ 2:0:0:2 sdb 8:16 [undef][ghost]
>>  \_ 5:0:0:2 sdd 8:48 [undef][ghost]
>>
>
>> --
>> dm-devel mailing list
>> dm-devel@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/dm-devel
>
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
>
>

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux