RE: RE: Problem with multipathd and a blacklisted device

"Ransegnola, Lori" <Lori.Ransegnola@xxxxxx> · Wed, 1 Apr 2009 13:40:13 +0000

> Try doing this
> without multipathd running, and see if the device still disappears.

Good point! I tried turning off multipathd and temporarily removed the disk.  Lo and behold, the disk path, /dev/hpdev/sda1, was removed!  I guess that this is caused by software RAID and not by multipathd.

Thanks for your suggestion and help.  At least now I know I was looking in the wrong direction.

Lori

> -----Original Message-----
> From: dm-devel-bounces@xxxxxxxxxx
> [mailto:dm-devel-bounces@xxxxxxxxxx] On Behalf Of Benjamin Marzinski
> Sent: Tuesday, March 31, 2009 11:50 AM
> To: device-mapper development
> Subject: Re:  RE: Problem with multipathd and a
> blacklisted device
>
> > > This configuration works well until I do some failure testing
> > > with one of the 2 blacklisted devs in the software RAID set.
> > > I found that if I temporarily remove disk sda and put it back
> > > a minute later the disk path, /dev/hpdev/sda1, is removed,
> > > even though it is blacklisted.
>
> multipathd shouldn't be removing and devices.  It removes
> paths from is
> list of monitored paths, but not from the filesystem.  Try doing this
> without multipathd running, and see if the device still disappears.
>
> That all being said, multipathd shouldn't be monitoring the device
> if it's backlisted.  Did you start up multipathd before you
> blacklisted
> the device?  If so, you need to run
>
> # service multipathd reload
>
> To make multipathd pick up the new configuration. Or you can simply
> restart it. You can check to see if it is monitoring the paths
> by running
>
> # multipathd -k"show paths"
>
> -Ben
>
> > >
> > > Here are some of the pertinent /var/log/messages lines:
> > >
> > > Mar 19 09:53:19 n0 mdadm: NewArray /dev/md0 Mar 19 10:05:12
> > > n0 kernel: mptbase: ioc0: LogInfo(0x31170000):
> > > Originator={PL}, Code={IO Device Missing Delay Retry},
> > > SubCode(0x0000) Mar 19 10:05:40 n0 last message repeated 5
> > > times Mar 19 10:05:42 n0 kernel: mptbase: ioc0:
> > > LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet
> > > Executed}, SubCode(0x0000) Mar 19 10:05:42 n0 kernel: sd
> > > 0:0:28:0: SCSI error: return code = 0x00010000 Mar 19
> > > 10:05:42 n0 kernel: end_request: I/O error, dev sda, sector
> > > 256119 Mar 19 10:05:42 n0 kernel: Buffer I/O error on device
> > > sda1, logical block 256056 Mar 19 10:05:42 n0 kernel: Buffer
> > > I/O error on device sda1, logical block 256057 Mar 19
> > > 10:05:42 n0 kernel: Buffer I/O error on device sda1, logical
> > > block 256058 Mar 19 10:05:42 n0 kernel: Buffer I/O error on
> > > device sda1, logical block 256059 Mar 19 10:05:42 n0 kernel:
> > > Buffer I/O error on device sda1, logical block 256060 Mar 19
> > > 10:05:42 n0 kernel: Buffer I/O error on device sda1, logical
> > > block 256061 Mar 19 10:05:42 n0 kernel: mptbase: ioc0:
> > > LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet
> > > Executed}, SubCode(0x0000) Mar 19 10:05:42 n0 kernel: Buffer
> > > I/O error on device sda1, logical block 256062 Mar 19
> > > 10:05:42 n0 kernel: Buffer I/O error on device sda1, logical
> > > block 256063 Mar 19 10:05:42 n0 kernel: sd 0:0:28:0: SCSI
> > > error: return code = 0x00010000 Mar 19 10:05:42 n0
> > > multipathd: sda: remove path (uevent) Mar 19 10:05:42 n0
> > > kernel: end_request: I/O error, dev sda, sector 585922495 Mar
> > > 19 10:05:42 n0 kernel: Buffer I/O error on device sda1,
> > > logical block 585922432 Mar 19 10:05:42 n0 xinetd[13096]:
> > > START: hacl-cfgudp pid=6034 from=127.0.0.1 Mar 19 10:05:42 n0
> > > multipathd: uevent trigger error Mar 19 10:05:42 n0 kernel:
> > > Buffer I/O error on device sda1, logical block 585922433
> > >
> > >
> > > The sda path is gone:
> > >
> > > /root> ls -l /dev/hpdev
> > > total 0
> > > lrwxrwxrwx 1 root root 7 Mar 18 03:32 sdb1 -> ../sdb1 /root>
> > >
> > > And I cannot reassemble the software raid set.  While the
> > > mdstat looks 'normal' the software raid becomes degraded.
> > >
> > > /etc/udev/rules.d> cat /proc/mdstat
> > > Personalities : [raid1]
> > > md0 : active raid1 sda1[2](F) sdb1[1]
> > >       292961216 blocks [2/1] [_U]
> > >
> > > unused devices: <none>
> > > /etc/udev/rules.d> mdadm --detail /dev/md0
> > > /dev/md0:
> > >         Version : 00.90.03
> > >   Creation Time : Wed Feb 18 14:15:03 2009
> > >      Raid Level : raid1
> > >      Array Size : 292961216 (279.39 GiB 299.99 GB)
> > >     Device Size : 292961216 (279.39 GiB 299.99 GB)
> > >    Raid Devices : 2
> > >   Total Devices : 2
> > > Preferred Minor : 0
> > >     Persistence : Superblock is persistent
> > >
> > >     Update Time : Thu Mar 19 10:28:45 2009
> > >           State : clean, degraded
> > >  Active Devices : 1
> > > Working Devices : 1
> > >  Failed Devices : 1
> > >   Spare Devices : 0
> > >
> > >            UUID : 94813396:a3e341fc:c5d4b8ba:0617a019
> > >          Events : 0.86
> > >
> > >     Number   Major   Minor   RaidDevice State
> > >        0       0        0        0      removed
> > >        1       8       17        1      active sync   /dev/sdb1
> > >
> > >        2       8        1        -      faulty spare
> > > /etc/udev/rules.d>
> > >
> > > The disk is fine and is back in and ready to go.
> > >
> > > So, 1) why does multipathd remove the path for a blacklisted
> > > device?  If it is blacklisted, shouldn't multipathd just
> > > leave it alone??
> > > And 2) what can I do to keep this from happening??
> > >
> > > Lori Ransegnola
> > >
> >
> > --
> > dm-devel mailing list
> > dm-devel@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/dm-devel
>
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel
>

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel