On Thu, May 15, 2014 at 11:45:40PM +0200, Christophe Varoqui wrote: > Ben, > I'd need your ack on this one. > Best regards, > Christophe Varoqui Sorry I dropped the ball on this one. I'm o.k. with this patch. The biggest issue I have with it has nothing to do with its correctness, but with rlookup_wwid()'s use of scan_device. Previously, the only scan_device call always failed. Now scan every device name, but we don't ever get anything out of it. First off, if we find a match, we will never use the id. Second, if we don't find a match we return the id that of the alias we were looking for, but if we do find a match we return the next id after the one we were looking for (which is completely pointless). It seems like we could just make rlookup_wwid() return success or failure, and then call scan_device() from use_existing_alias() if we need to, and take out a bunch of pointless work that rlookup_wwid() is doing. -Ben > > On Thu, May 15, 2014 at 9:21 PM, Stewart, Sean > <[1]Sean.Stewart@xxxxxxxxxx> wrote: > > Ping... Any additional comments or suggestions for this patch? > Bumping in case it got lost in the backlog. :) > On Fri, 2014-04-11 at 17:01 +0000, Stewart, Sean wrote: > > On Fri, 2014-04-11 at 17:03 +0100, Bryn M. Reeves wrote: > > > On Fri, Mar 28, 2014 at 09:01:14PM +0000, Stewart, Sean wrote: > > > > When a system is booted to the SAN, a condition can occur where > one > > > > user friendly name is given to a disk during boot, but multipathd > tries > > > > to allocate a different one after boot. If the second alias is > already > > > > used by another device, multipathd can't rename it. Multipathd > then has > > > > incorrect information about the alias/wwid relationships, which > can > > > > result in paths being added to the wrong map. > > > > > > This should only happen if the initramfs and root file system have > > > inconsistent multipath configurations (either multipath.conf or > bindings > > > / wwids file mismatched). That's not really a valid configuration > for > > > the system to be in and leads to the type of problems you describe. > > > > That is true that it only happens if they are out of sync. We tried > > remaking the initramfs to fix the problem, but it didn't help. > > > > > > > This patch works around this problem by first trying to use the > alias > > > > already bound to a device during boot. If the bindings file has > that > > > > alias bound to a different device, it'll auto generate a new alias > to > > > > rename it to. > > > > > > To be honest I'd prefer to see this cause an error. These types of > > > configurations currently run the risk of silent data corruption - > I'd > > > much rather deal with a system that refuses to boot due to an out of > > > date initramfs image than one that quietly remaps paths in > unexpected > > > ways. > > > > The issue, though, is that the system does not refuse to boot. In the > > case we saw, it booted anyway, our QA engineer ran a test, and it > ended > > with a data corruption. A user could perform a fresh installation, > > map > > new luns, reboot, and without any way of realizing it have essentially > a > > ticking time bomb on their hands, ready to go off as soon as there's a > > blip in the SAN. > > References > > Visible links > 1. mailto:Sean.Stewart@xxxxxxxxxx -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel