Hi all,
We have been battling with this issue with help from IBM support, but I'd like to understand the situation a bit better, so thought I'd go to the source ;-) We seem to have a work-around, but it seems rather woolly and sledge-hammer like: re-run dracut to rebuild the initramfs copy of /etc/multipaths/bindings and wwids files, and effectively tore down, and re-built the multipaths, and overlying LVM storage from scratch. It appears that the copy of bindings in initramfs assigned a different mpath id to the affected wwids than the copy in /etc/multipath. I would not have expected multipathd to be inspecting the contents of initramfs (and don't think it is), but I'm also not sure if this is a symptom or a cause. The multipaths had been added several weeks earlier, and had been working properly ever since, but I don't believe there had been a re-boot since they had been added, nor had the intitramfs been updated. We had a scheduled SAN hardware maintenance restart, and later noticed strange filesystem corruption on a guest server. The issue is that several mpath devices ended up with overlapping block devices after an interruption to half of the paths, due to a hardware re-start of one of the SAN nodes. All the block devices from mpathn ended up being used by another mpathm too: Truncated output of multipath -ll:
mpathn (360050763008080eef80000000000002d) dm-13 IBM ,2145 | |- 1:0:0:12 sdm 8:192 active undef running | `- 3:0:0:12 sdao 66:128 active undef running |- 3:0:1:12 sdbc 67:96 failed undef running `- 1:0:1:12 sdaa 65:160 active undef running mpathm (360050763008080eef80000000000002c) dm-12 IBM ,2145 | |- 1:0:1:12 sdaa 65:160 active undef running | `- 3:0:1:12 sdbc 67:96 active undef running | |- 1:0:1:11 sdz 65:144 active undef running | |- 3:0:1:11 sdbb 67:80 failed undef running | |- 1:0:0:12 sdm 8:192 active undef running | `- 3:0:0:12 sdao 66:128 active undef running |- 1:0:0:11 sdl 8:176 active undef running `- 3:0:0:11 sdan 66:112 active undef running Note: that failed sdbb device eventually came back as active, as this was taken just after the hardware was reset, but before everything had settled down again. However, the overlap never resolved itself. Environment: uname -a Linux p8-srvr1 3.10.82-2042.1.pkvm2_1_1.71.ppc64 #1 SMP Fri Jul 31 09:52:38 CDT 2015 ppc64 ppc64 ppc64 GNU/Linux cat /etc/issue IBM_PowerKVM release 2.1.1 build 62 service (pkvm2_1_1) Kernel \r on a \m (\l) rpm -qa |grep multipath device-mapper-multipath-libs-0.4.9-51.pkvm2_1.5.ppc64 device-mapper-multipath-0.4.9-51.pkvm2_1.5.ppc64 So my questions are: a) Is this expected behaviour or a bug? b) If a bug, is there a fix? c) Is there any further information you need to help diagnose? Regards,
Andy D'Arcy Jewell Linux/FOSS Operations
CSI LTD
****************************************************************** IMPORTANT NOTICE |
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel