On Fri, 2020-06-19 at 11:30 -0500, Benjamin Marzinski wrote: > On Fri, Jun 19, 2020 at 08:32:34AM +0200, Hannes Reinecke wrote: > > > > fac68d7 is related to the famous "dm-multipath: Accept failed > > > > paths for > > > > multipath maps" patch (e.g. > > > > https://patchwork.kernel.org/patch/3368381/#7193001), which > > > > never made > > > > it upstream. SUSE kernels have shipped this patch for a long > > > > time, but > > > > we don't apply it any more in recent kernels. > > > > > > > > With this patch, The reinstate_path() failure would occur if > > > > multipathd > > > > had created a table with a "disabled" device (one which would > > > > be > > > > present in a dm map even though the actual block device didn't > > > > exist), > > > > and then tried to reinstate such a path. It sounds unlikely, > > > > but it > > > > might be possible if devices are coming and going in quick > > > > succession. > > > > In that situation, and with the "accept failed path" patch > > > > applied, a > > > > reload makes some sense, because reload (unlike reinstate) > > > > would not > > > > fail (at least not for this reason) and would actually add that > > > > just- > > > > reinstated path. OTOH, it's not likely that the reload would > > > > improve > > > > matters, either. After all, multipathd is just trying to > > > > reinstate a > > > > non-existing path. So, I'm fine with skipping the reload. > > > > > > It's actually _not_ unlikely, but a direct result of multipathd > > listening to > > uevents. > > > > If you have a map with four paths, and all four of them are going > > down, you > > end up getting four events. > > And multipath will be processing each event _individually_, causing > > it to > > generate a reload sequence like: > > > > [a b c d] > > [b c d] > > [c d] > > [d] > > [] > > > > Of which only the last is valid, as all the intermediate ones > > contain > > invalid paths. > > > > And _that_ is the scenario I was referring to when creating the > > patch. > > But even still, I'm not in favor of calling ev_add_path() with the > code > as it currently is. In the case you point out, it will presumably > fail > when adopt_paths() calls pathinfo(). That will orphan the path, which > is > good. But if the map got removed (intentionally) instead of the path, > ev_add_path() will recreate the map, which is bad. Also, it seems > more > likely for a path to still exist and be usable while the multipath > device is gone (the bad case), than for the path to pass the checker > function and then disappear before mutipathd tries to reinstate it > immediately afterwards (the good case). Further, the bad result, > where > a deleted multipath device reappears, is actually a problem for > users. > Having multipathd take a bit of time and pointless effort to catch up > after multiple changes happen at once doesn't effect users > noticeably. > > Since the checkerloop thread spends the vast majority of it's not not > checking any specific path, if a path goes away, it is most likely to > be > caught by the path_offline() function. I we want to do something to > proactively deal with a path that has been removed, we should do it Hannes and I discussed about this, and he agreed that calling ev_add_path() in the situation at hand wasn't helpful. Hence his suggestion with pp->mpp that we discussed in the other sub-thread. Regards, Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Software Solutions Germany GmbH HRB 36809, AG Nürnberg GF: Felix Imendörffer -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel