On Wed, Nov 16, 2011 at 2:24 PM, Johannes Hirte wrote: [snip] > Yes, this is because the rdac module detected the LUN in AVT mode and refused > to work with it. This will happen every time you access a ghost path without > rdac. > >> - On the presented LUN I configured a PV, VG, LV and ext4 fs (not system fs) >> At reboot at host side I see messages related to duplicated PV IDs for >> the paths (sdb, sdc, sdd, sde): they comes before vg activation and >> before multipathd start... >> Is this normal, because at the first vgscan run during boot, multipath >> configuration has not been instantiated yet..? >> I have to check, but I don't remember similar messages with eh el 5.7 >> in other SAN configurations, where the VG is not a system VG.... > > You should avoid to access the sdX directly. If you need to run lvm before > multipath is up, you can blacklist the sdX in the lvm.conf. So I configured: - LUN on DS4700 as LNXCLVMWARE that I found should disable AVT - multipath as standard without any particular setting (I only blacklisted the internal disk) At the start time of the system I get (both in console and then I found it in /var/log/messages too): Nov 23 17:32:56 testserver kernel: sdc:end_request: I/O error, dev sdb, sector 0 Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdb, logical block 0 Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdc, sector 0 Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdc, logical block 0 Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdb, sector 0 Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdb, logical block 0 Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdc, sector 0 Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdc, logical block 0 Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdb, sector 0 Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdb, logical block 0 Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdc, sector 0 Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdc, logical block 0 Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdb, sector 0 Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdb, logical block 0 Nov 23 17:32:56 testserver kernel: end_request: I/O error, dev sdc, sector 0 Nov 23 17:32:56 testserver kernel: Buffer I/O error on device sdc, logical block 0 This happens for sdb ad sdc only (probably passive controller disk paths?) And this other ones: Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb, sector 0 Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb, sector 7320493952 Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb, sector 7320494064 Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb, sector 0 Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb, sector 8 Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdb, sector 0 Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdc, sector 0 Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdc, sector 7320493952 Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdc, sector 7320494064 Nov 23 17:32:58 testserver kernel: end_request: I/O error, dev sdc, sector 0 ... then Nov 23 17:33:00 testserver kernel: device-mapper: multipath: version 1.0.6 loaded Nov 23 17:33:00 testserver kernel: sd 3:0:0:1: rdac: LUN 1 (unowned) Nov 23 17:33:00 testserver kernel: sd 3:0:1:1: rdac: LUN 1 (owned) Nov 23 17:33:00 testserver kernel: sd 4:0:0:1: rdac: LUN 1 (unowned) Nov 23 17:33:00 testserver kernel: sd 4:0:1:1: rdac: LUN 1 (owned) Nov 23 17:33:01 testserver kernel: rdac: device handler registered Nov 23 17:33:01 testserver kernel: device-mapper: multipath: Using scsi_dh module scsi_dh_rdac for failover/failback and device management. Nov 23 17:33:01 testserver kernel: device-mapper: multipath round-robin: version 1.0.0 loaded Nov 23 17:33:01 testserver kernel: sd 3:0:0:1: rdac: array Z1_BEIC_DS4700, ctlr 0, queueing MODE_SELECT command Nov 23 17:33:01 testserver kernel: sd 3:0:0:1: rdac: array Z1_BEIC_DS4700, ctlr 0, MODE_SELECT completed Nov 23 17:33:01 testserver kernel: sd 4:0:0:1: rdac: array Z1_BEIC_DS4700, ctlr 0, queueing MODE_SELECT command Nov 23 17:33:01 testserver kernel: sd 4:0:0:1: rdac: array Z1_BEIC_DS4700, ctlr 0, MODE_SELECT completed Nov 23 17:33:01 testserver kernel: end_request: I/O error, dev sdd, sector 7320494072 Nov 23 17:33:01 testserver kernel: printk: 4 messages suppressed. Nov 23 17:33:01 testserver kernel: Buffer I/O error on device sdd, logical block 915061759 Nov 23 17:33:01 testserver kernel: end_request: I/O error, dev sde, sector 7320494072 Nov 23 17:33:02 testserver kernel: printk: 49 messages suppressed. Nov 23 17:33:02 testserver kernel: Buffer I/O error on device sde, logical block 0 Nov 23 17:33:02 testserver kernel: Buffer I/O error on device sde, logical block 2 Nov 23 17:33:02 testserver kernel: Buffer I/O error on device sde, logical block 3 And then no other I/O error messages. Donna if this is avoidable or not.... So after complete startup the situation is: [root@testserver ~]# multipath -l mpath1 (3600a0b80005012440000093e4a55cf33) dm-6 IBM,1814 FAStT [size=3.4T][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=0][active] \_ 3:0:0:1 sdb 8:16 [active][undef] \_ 4:0:0:1 sdc 8:32 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 3:0:1:1 sdd 8:48 [active][undef] \_ 4:0:1:1 sde 8:64 [active][undef] When I activate the LVM on mpath1 PV and mount the file system: Nov 23 17:34:31 testserver kernel: EXT4-fs (dm-7): mounted filesystem with ordered data mode Nov 23 17:34:47 testserver kernel: JBD: barrier-based sync failed on dm-7-8 - disabling barriers --> donna if it is to be intended as a problem I instantiate I/O without problems. Then I test to change active controller for the lun at DS4700 side during a running I/O session (dd seq read of 10Gb) , and I get Nov 23 17:43:02 testserver kernel: end_request: I/O error, dev sdb, sector 2110328 Nov 23 17:43:02 testserver kernel: device-mapper: multipath: Failing path 8:16. Nov 23 17:43:02 testserver multipathd: 8:16: mark as failed Nov 23 17:43:02 testserver multipathd: mpath1: remaining active paths: 3 Nov 23 17:43:02 testserver multipathd: dm-6: add map (uevent) Nov 23 17:43:02 testserver multipathd: dm-6: devmap already registered Nov 23 17:43:03 testserver kernel: end_request: I/O error, dev sdc, sector 2110328 Nov 23 17:43:03 testserver kernel: device-mapper: multipath: Failing path 8:32. Nov 23 17:43:03 testserver multipathd: dm-6: add map (uevent) Nov 23 17:43:03 testserver multipathd: dm-6: devmap already registered Nov 23 17:43:03 testserver multipathd: 8:32: mark as failed Nov 23 17:43:03 testserver multipathd: mpath1: remaining active paths: 2 Nov 23 17:43:06 testserver multipathd: sdb: rdac checker reports path is ghost Nov 23 17:43:06 testserver multipathd: 8:16: reinstated Nov 23 17:43:06 testserver multipathd: mpath1: remaining active paths: 3 Nov 23 17:43:06 testserver kernel: device-mapper: multipath: Using scsi_dh module scsi_dh_rdac for failover/failback and device management. Nov 23 17:43:06 testserver multipathd: mpath1: load table [0 7320494080 multipath 0 1 rdac 2 1 round-robin 0 3 1 8:32 1000 8:48 1000 8:64 1000 round-robin 0 1 1 8:16 Nov 23 17:43:06 testserver multipathd: dm-6: add map (uevent) Nov 23 17:43:06 testserver multipathd: dm-6: devmap already registered Nov 23 17:43:06 testserver kernel: device-mapper: multipath: Failing path 8:32. Nov 23 17:43:06 testserver multipathd: dm-6: add map (uevent) Nov 23 17:43:06 testserver multipathd: dm-6: devmap already registered Nov 23 17:43:06 testserver multipathd: dm-6: add map (uevent) Nov 23 17:43:06 testserver multipathd: dm-6: devmap already registered Nov 23 17:43:07 testserver multipathd: sdc: rdac checker reports path is ghost Nov 23 17:43:07 testserver multipathd: 8:32: reinstated Nov 23 17:43:07 testserver kernel: sd 4:0:0:1: rdac: array Z1_BEIC_DS4700, ctlr 0, queueing MODE_SELECT command Nov 23 17:43:07 testserver multipathd: mpath1: remaining active paths: 4 Nov 23 17:43:07 testserver kernel: device-mapper: multipath: Using scsi_dh module scsi_dh_rdac for failover/failback and device management. Nov 23 17:43:08 testserver kernel: sd 4:0:0:1: rdac: array Z1_BEIC_DS4700, ctlr 0, MODE_SELECT completed Nov 23 17:43:08 testserver multipathd: mpath1: load table [0 7320494080 multipath 0 1 rdac 2 1 round-robin 0 2 1 8:48 1000 8:64 1000 round-robin 0 2 1 8:32 1000 8:16 Nov 23 17:43:08 testserver multipathd: dm-6: add map (uevent) Nov 23 17:43:08 testserver multipathd: dm-6: devmap already registered Nov 23 17:43:08 testserver multipathd: dm-6: add map (uevent) Nov 23 17:43:08 testserver multipathd: dm-6: devmap already registered Nov 23 17:43:08 testserver kernel: sd 3:0:1:1: rdac: array Z1_BEIC_DS4700, ctlr 1, queueing MODE_SELECT command Nov 23 17:43:10 testserver kernel: sd 3:0:1:1: rdac: array Z1_BEIC_DS4700, ctlr 1, MODE_SELECT completed Nov 23 17:43:10 testserver kernel: sd 4:0:1:1: rdac: array Z1_BEIC_DS4700, ctlr 1, queueing MODE_SELECT command Nov 23 17:43:11 testserver kernel: sd 4:0:1:1: rdac: array Z1_BEIC_DS4700, ctlr 1, MODE_SELECT completed Nov 23 17:43:12 testserver multipathd: sdd: rdac checker reports path is up Nov 23 17:43:12 testserver multipathd: 8:48: reinstated Nov 23 17:43:12 testserver multipathd: sde: rdac checker reports path is up Nov 23 17:43:12 testserver multipathd: 8:64: reinstated The overall increased time is 3-4 seconds for a 1 minute I/O period Without failover: [root@testserver ~]# time dd if=/testfs/testfile bs=1024k count=10000 of=/dev/null 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 61.9951 seconds, 169 MB/s real 1m1.996s user 0m0.002s sys 0m7.088s With change of active controller in the mid: [root@testserver ~]# time dd if=/testfs/testfile1 bs=1024k count=10000 of=/dev/null 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 65.6529 seconds, 160 MB/s real 1m5.654s user 0m0.007s sys 0m7.175s So, quite good, and without error at user side. at the end the multipath config is this: [root@testserver ~]# multipath -l mpath1 (3600a0b80005012440000093e4a55cf33) dm-6 IBM,1814 FAStT [size=3.4T][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=0][active] \_ 3:0:1:1 sdd 8:48 [active][undef] \_ 4:0:1:1 sde 8:64 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 4:0:0:1 sdc 8:32 [active][undef] \_ 3:0:0:1 sdb 8:16 [active][undef] Questions: 1) Can I conclude it is ok as a configuration? Or any other tests to carry on? I confirm I didn't get any snmp trap from the ds4700 as happened before... 2) At the moment I put this in lvm.conf to whitelist the root volume groups and blacklist the san individual paths and then delete the .cache file and reboot filter = [ "a/dev/mapper/.*/", "a/dev/sda/", "a/dev/sda2/", "r/dev/sd.*/" ] Is this ok? If root PV is on sda2, do I need to whitelist both sda and sda2 or only sda2? 3) Based on messages during failover, is it true that I can avoid explicitly put scsi_dh in initrd? If I create initrd this way: mkinitrd /boot/initrd-$(uname -r)-scsi_dh.img $(uname -r) --preload=scsi_dh_rdac I get this difference: [root@testserver ~]# diff /tmp/new/init /tmp/current/init 44,51d43 < echo "Loading scsi_mod.ko module" < insmod /lib/scsi_mod.ko < echo "Loading sd_mod.ko module" < insmod /lib/sd_mod.ko < echo "Loading scsi_dh.ko module" < insmod /lib/scsi_dh.ko < echo "Loading scsi_dh_rdac.ko module" < insmod /lib/scsi_dh_rdac.ko 62a55,58 > echo "Loading scsi_mod.ko module" > insmod /lib/scsi_mod.ko > echo "Loading sd_mod.ko module" > insmod /lib/sd_mod.ko or will it help in any way? BTW: The I/O tests above were done with standard initrd (so the > side of the diff without the scsi_dh_rdac) I only run the mkinitrd to sort out how would have been create the init file... 4) the san lun is 3.4Tb and I'm going to add another one of about 5Tb In messages I see this Nov 23 17:32:58 testserver kernel: sde : very big device. try to use READ CAPACITY(16). I found in an old kernel ml post that actually it should mean "trying to use" ... so only informational message. Can anyone confirm this? Thanks again in advance for your help, Gianluca -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel