Re: info on enabling only one path with rdac and DS4700

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Gianluca Cecchi [mailto:gianluca.cecchi@xxxxxxxxx]
> Sent: Wednesday, November 23, 2011 11:22 AM
> To: device-mapper development
> Subject: Re:  info on enabling only one path with rdac and
> DS4700
> 
> On Wed, Nov 16, 2011 at 2:24 PM, Johannes Hirte  wrote:
> [snip]
> > Yes, this is because the rdac module detected the LUN in AVT mode and
> refused
> > to work with it. This will happen every time you access a ghost path
> without
> > rdac.
> 
> >
> >> - On the presented LUN I configured a PV, VG, LV and ext4 fs (not
> system fs)
> >> At reboot at host side I see messages related to duplicated PV IDs
> for
> >> the paths (sdb, sdc, sdd, sde): they comes before vg activation and
> >> before multipathd start...
> >> Is this normal, because at the first vgscan run during boot,
> multipath
> >> configuration has not been instantiated yet..?
> >> I have to check, but I don't remember similar messages with eh el
> 5.7
> >> in other SAN configurations, where the VG is not a system VG....
> >
> > You should avoid to access the sdX directly. If you need to run lvm
> before
> > multipath is up, you can blacklist the sdX in the lvm.conf.
> 
> 
> So I configured:
> - LUN on DS4700 as LNXCLVMWARE that I found should disable AVT
> - multipath as standard without any particular setting (I only
> blacklisted the internal disk)
> 
> At the start time of the system I get (both in console and then I
> found it in /var/log/messages too):
> 
> Nov 23 17:32:56 testserver kernel:  sdc:end_request: I/O error, dev
> sdb, sector 0
> Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdb,
> logical block 0
> Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdc,
> sector 0
> Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdc,
> logical block 0
> Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdb,
> sector 0
> Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdb,
> logical block 0
> Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdc,
> sector 0
> Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdc,
> logical block 0
> Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdb,
> sector 0
> Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdb,
> logical block 0
> Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdc,
> sector 0
> Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdc,
> logical block 0
> Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdb,
> sector 0
> Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdb,
> logical block 0
> Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdc,
> sector 0
> Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdc,
> logical block 0
> 
> This happens for sdb ad sdc only (probably passive controller disk
> paths?)
> 
> And this other ones:
> Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb,
> sector 0
> Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb,
> sector 7320493952
> Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb,
> sector 7320494064
> Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb,
> sector 0
> Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb,
> sector 8
> Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb,
> sector 0
> Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdc,
> sector 0
> Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdc,
> sector 7320493952
> Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdc,
> sector 7320494064
> Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdc,
> sector 0
> ...
> then
> Nov 23 17:33:00 testserver kernel: device-mapper: multipath: version
> 1.0.6 loaded
> Nov 23 17:33:00 testserver kernel: sd 3:0:0:1: rdac: LUN 1 (unowned)
> Nov 23 17:33:00 testserver kernel: sd 3:0:1:1: rdac: LUN 1 (owned)
> Nov 23 17:33:00 testserver kernel: sd 4:0:0:1: rdac: LUN 1 (unowned)
> Nov 23 17:33:00 testserver kernel: sd 4:0:1:1: rdac: LUN 1 (owned)
> Nov 23 17:33:01 testserver kernel: rdac: device handler registered
> Nov 23 17:33:01 testserver kernel: device-mapper: multipath: Using
> scsi_dh module scsi_dh_rdac for failover/failback and device
> management.
> Nov 23 17:33:01 testserver kernel: device-mapper: multipath
> round-robin: version 1.0.0 loaded
> Nov 23 17:33:01 testserver kernel: sd 3:0:0:1: rdac: array
> Z1_BEIC_DS4700, ctlr 0, queueing MODE_SELECT command
> Nov 23 17:33:01 testserver kernel: sd 3:0:0:1: rdac: array
> Z1_BEIC_DS4700, ctlr 0, MODE_SELECT completed
> Nov 23 17:33:01 testserver kernel: sd 4:0:0:1: rdac: array
> Z1_BEIC_DS4700, ctlr 0, queueing MODE_SELECT command
> Nov 23 17:33:01 testserver kernel: sd 4:0:0:1: rdac: array
> Z1_BEIC_DS4700, ctlr 0, MODE_SELECT completed
> Nov 23 17:33:01 testserver kernel: end_request: I/O error, dev sdd,
> sector 7320494072
> Nov 23 17:33:01 testserver kernel: printk: 4 messages suppressed.
> Nov 23 17:33:01 testserver kernel: Buffer I/O error on device sdd,
> logical block 915061759
> Nov 23 17:33:01 testserver kernel: end_request: I/O error, dev sde,
> sector 7320494072
> Nov 23 17:33:02 testserver kernel: printk: 49 messages suppressed.
> Nov 23 17:33:02 testserver kernel: Buffer I/O error on device sde,
> logical block 0
> Nov 23 17:33:02 testserver kernel: Buffer I/O error on device sde,
> logical block 2
> Nov 23 17:33:02 testserver kernel: Buffer I/O error on device sde,
> logical block 3
> 
> And then no other I/O error messages. Donna if this is avoidable or
> not....
> 
> So after complete startup the situation is:
> [root@testserver ~]# multipath -l
> mpath1 (3600a0b80005012440000093e4a55cf33) dm-6 IBM,1814      FAStT
> [size=3.4T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> \_ round-robin 0 [prio=0][active]
>  \_ 3:0:0:1 sdb 8:16  [active][undef]
>  \_ 4:0:0:1 sdc 8:32  [active][undef]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 3:0:1:1 sdd 8:48  [active][undef]
>  \_ 4:0:1:1 sde 8:64  [active][undef]
> 
> When I activate the LVM on mpath1 PV and mount the file system:
> Nov 23 17:34:31 testserver kernel: EXT4-fs (dm-7): mounted filesystem
> with ordered data mode
> Nov 23 17:34:47 testserver kernel: JBD: barrier-based sync failed on
> dm-7-8 - disabling barriers
> --> donna if it is to be intended as a problem
> 
> I instantiate I/O without problems.
> 
> Then I test to change active controller for the lun at DS4700 side
> during a running I/O session (dd seq read of 10Gb) , and I get
> Nov 23 17:43:02 testserver kernel: end_request: I/O error, dev sdb,
> sector 2110328
> Nov 23 17:43:02 testserver kernel: device-mapper: multipath: Failing
> path 8:16.
> Nov 23 17:43:02 testserver multipathd: 8:16: mark as failed
> Nov 23 17:43:02 testserver multipathd: mpath1: remaining active paths:
> 3
> Nov 23 17:43:02 testserver multipathd: dm-6: add map (uevent)
> Nov 23 17:43:02 testserver multipathd: dm-6: devmap already registered
> Nov 23 17:43:03 testserver kernel: end_request: I/O error, dev sdc,
> sector 2110328
> Nov 23 17:43:03 testserver kernel: device-mapper: multipath: Failing
> path 8:32.
> Nov 23 17:43:03 testserver multipathd: dm-6: add map (uevent)
> Nov 23 17:43:03 testserver multipathd: dm-6: devmap already registered
> Nov 23 17:43:03 testserver multipathd: 8:32: mark as failed
> Nov 23 17:43:03 testserver multipathd: mpath1: remaining active paths:
> 2
> Nov 23 17:43:06 testserver multipathd: sdb: rdac checker reports path
> is ghost
> Nov 23 17:43:06 testserver multipathd: 8:16: reinstated
> Nov 23 17:43:06 testserver multipathd: mpath1: remaining active paths:
> 3
> Nov 23 17:43:06 testserver kernel: device-mapper: multipath: Using
> scsi_dh module scsi_dh_rdac for failover/failback and device
> management.
> Nov 23 17:43:06 testserver multipathd: mpath1: load table [0
> 7320494080 multipath 0 1 rdac 2 1 round-robin 0 3 1 8:32 1000 8:48
> 1000 8:64 1000 round-robin 0 1 1 8:16
> Nov 23 17:43:06 testserver multipathd: dm-6: add map (uevent)
> Nov 23 17:43:06 testserver multipathd: dm-6: devmap already registered
> Nov 23 17:43:06 testserver kernel: device-mapper: multipath: Failing
> path 8:32.
> Nov 23 17:43:06 testserver multipathd: dm-6: add map (uevent)
> Nov 23 17:43:06 testserver multipathd: dm-6: devmap already registered
> Nov 23 17:43:06 testserver multipathd: dm-6: add map (uevent)
> Nov 23 17:43:06 testserver multipathd: dm-6: devmap already registered
> Nov 23 17:43:07 testserver multipathd: sdc: rdac checker reports path
> is ghost
> Nov 23 17:43:07 testserver multipathd: 8:32: reinstated
> Nov 23 17:43:07 testserver kernel: sd 4:0:0:1: rdac: array
> Z1_BEIC_DS4700, ctlr 0, queueing MODE_SELECT command
> Nov 23 17:43:07 testserver multipathd: mpath1: remaining active paths:
> 4
> Nov 23 17:43:07 testserver kernel: device-mapper: multipath: Using
> scsi_dh module scsi_dh_rdac for failover/failback and device
> management.
> Nov 23 17:43:08 testserver kernel: sd 4:0:0:1: rdac: array
> Z1_BEIC_DS4700, ctlr 0, MODE_SELECT completed
> Nov 23 17:43:08 testserver multipathd: mpath1: load table [0
> 7320494080 multipath 0 1 rdac 2 1 round-robin 0 2 1 8:48 1000 8:64
> 1000 round-robin 0 2 1 8:32 1000 8:16
> Nov 23 17:43:08 testserver multipathd: dm-6: add map (uevent)
> Nov 23 17:43:08 testserver multipathd: dm-6: devmap already registered
> Nov 23 17:43:08 testserver multipathd: dm-6: add map (uevent)
> Nov 23 17:43:08 testserver multipathd: dm-6: devmap already registered
> Nov 23 17:43:08 testserver kernel: sd 3:0:1:1: rdac: array
> Z1_BEIC_DS4700, ctlr 1, queueing MODE_SELECT command
> Nov 23 17:43:10 testserver kernel: sd 3:0:1:1: rdac: array
> Z1_BEIC_DS4700, ctlr 1, MODE_SELECT completed
> Nov 23 17:43:10 testserver kernel: sd 4:0:1:1: rdac: array
> Z1_BEIC_DS4700, ctlr 1, queueing MODE_SELECT command
> Nov 23 17:43:11 testserver kernel: sd 4:0:1:1: rdac: array
> Z1_BEIC_DS4700, ctlr 1, MODE_SELECT completed
> Nov 23 17:43:12 testserver multipathd: sdd: rdac checker reports path
> is up
> Nov 23 17:43:12 testserver multipathd: 8:48: reinstated
> Nov 23 17:43:12 testserver multipathd: sde: rdac checker reports path
> is up
> Nov 23 17:43:12 testserver multipathd: 8:64: reinstated
> 
> The overall increased time is 3-4 seconds for a 1 minute I/O period
> Without failover:
> [root@testserver ~]# time dd if=/testfs/testfile bs=1024k count=10000
> of=/dev/null
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 61.9951 seconds, 169 MB/s
> 
> real	1m1.996s
> user	0m0.002s
> sys	0m7.088s
> 
> With change of active controller in the mid:
> [root@testserver ~]# time dd if=/testfs/testfile1 bs=1024k count=10000
> of=/dev/null
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 65.6529 seconds, 160 MB/s
> 
> real	1m5.654s
> user	0m0.007s
> sys	0m7.175s
> 
> So, quite good, and without error at user side.
> 
> at the end the multipath config is this:
> [root@testserver ~]# multipath -l
> mpath1 (3600a0b80005012440000093e4a55cf33) dm-6 IBM,1814      FAStT
> [size=3.4T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> \_ round-robin 0 [prio=0][active]
>  \_ 3:0:1:1 sdd 8:48  [active][undef]
>  \_ 4:0:1:1 sde 8:64  [active][undef]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 4:0:0:1 sdc 8:32  [active][undef]
>  \_ 3:0:0:1 sdb 8:16  [active][undef]
> 
> Questions:
> 
> 1) Can I conclude it is ok as a configuration? Or any other tests to
> carry on?
> I confirm I didn't get any snmp trap from the ds4700 as happened
> before...

 Your configuration looks good to me. 

> 
> 2) At the moment I put this in lvm.conf to whitelist the root volume
> groups and blacklist the san individual paths and then delete the
> .cache file and reboot
> filter = [ "a/dev/mapper/.*/", "a/dev/sda/", "a/dev/sda2/",
> "r/dev/sd.*/" ]
> Is this ok?
> If root PV is on sda2, do I need to whitelist both sda and sda2 or only
> sda2?
> 
> 3) Based on messages during failover, is it true that I can avoid
> explicitly put scsi_dh in initrd?
> If I create initrd this way:
> mkinitrd /boot/initrd-$(uname -r)-scsi_dh.img $(uname -r) --
> preload=scsi_dh_rdac
> I get this difference:
> [root@testserver ~]# diff /tmp/new/init /tmp/current/init
> 44,51d43
> < echo "Loading scsi_mod.ko module"
> < insmod /lib/scsi_mod.ko
> < echo "Loading sd_mod.ko module"
> < insmod /lib/sd_mod.ko
> < echo "Loading scsi_dh.ko module"
> < insmod /lib/scsi_dh.ko
> < echo "Loading scsi_dh_rdac.ko module"
> < insmod /lib/scsi_dh_rdac.ko
> 62a55,58
> > echo "Loading scsi_mod.ko module"
> > insmod /lib/scsi_mod.ko
> > echo "Loading sd_mod.ko module"
> > insmod /lib/sd_mod.ko
> 
> or will it help in any way?

Having scsi_dh_rdac in initrd will help to get rid of the initial I/O errors you are seeing.

> BTW: The I/O tests above were done with standard initrd (so the > side
> of the diff without the scsi_dh_rdac)
> I only run the mkinitrd to sort out how would have been create the init
> file...
> 
> 4) the san lun is 3.4Tb and I'm going to add another one of about 5Tb
> In messages I see this
> Nov 23 17:32:58 testserver kernel: sde : very big device. try to use
> READ CAPACITY(16).
> 
> I found in an old kernel ml post that actually it should mean "trying
> to use" ... so only informational message.
> Can anyone confirm this?

It is only informational..

> 
> Thanks again in advance for your help,
> Gianluca
> 
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel


[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux