On Wed, Oct 10, 2018 at 10:05:06PM +0200, Martin Wilck wrote: > If a path was still not properly initialized after exhausting the > retrigger tries, it used to remain in INIT_MISSING_UDEV state forever. > get_uid() might fall back to non-udev-based methods to determine > the WWID, but it would never be called for a path in this state any more. > > This patch changes this behavior by resetting the path back to FAILED > state if udev can't provide the WWID even after retriggering. Now, if > the path ever happens to be in PATH_UP or PATH_GHOST state again, > pathinfo(DI_ALL) will be called from check_path(), and there's at least > some chance to obtain a WWID for it. What you have seems reasonable, but looking at the code that calls pathinfo() in this case, I do have some related suggestions. Since check_path() always returns early if !pp->mpp and pp->initialized != INIT_FAILED, we don't need the pp->initialized != INIT_MISSING_UDEV check before doing the pathinfo (although we might want to check pp->initialized == INIT_FAILED, in case we add other pp->initialized states in the future). Also, I don't think that checking if pathinfo returns PATHINFO_OK is the right thing to do before calling ev_add_path(). pathinfo() will return PATHINFO_OK even if we fail to get the wwid. I'm pretty sure this should check if pp->initialized == INIT_OK instead. Finally, if pp->initialized != INIT_OK after the call to pathinfo(), would you object to setting pp->tick to max_checkint so that we don't keep checking the path as often. It is possible that this really is a device that shouldn't be multipathed, and for which we will never be able to get a wwid. It seems like after some many failures, we don't need to be so proactive about rechecking an active path where we simply aren't getting the wwid. -Ben > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > --- > multipathd/main.c | 32 +++++++++++++++++++++++--------- > 1 file changed, 23 insertions(+), 9 deletions(-) > > diff --git a/multipathd/main.c b/multipathd/main.c > index 2d45d989..a9e1a4bd 100644 > --- a/multipathd/main.c > +++ b/multipathd/main.c > @@ -1828,15 +1828,29 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) > retrigger_tries = conf->retrigger_tries; > checkint = conf->checkint; > put_multipath_config(conf); > - if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV && > - pp->retriggers < retrigger_tries) { > - condlog(2, "%s: triggering change event to reinitialize", > - pp->dev); > - pp->initialized = INIT_REQUESTED_UDEV; > - pp->retriggers++; > - sysfs_attr_set_value(pp->udev, "uevent", "change", > - strlen("change")); > - return 0; > + if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV) { > + if (pp->retriggers < retrigger_tries) { > + condlog(2, "%s: triggering change event to reinitialize", > + pp->dev); > + pp->initialized = INIT_REQUESTED_UDEV; > + pp->retriggers++; > + sysfs_attr_set_value(pp->udev, "uevent", "change", > + strlen("change")); > + return 0; > + } else { > + condlog(1, "%s: not initialized after %d udev retriggers", > + pp->dev, retrigger_tries); > + /* > + * Make sure that the "add missing path" code path > + * below may reinstate the path later, if it ever > + * comes up again. > + * The WWID needs not be cleared; if it was set, the > + * state hadn't been INIT_MISSING_UDEV in the first > + * place. > + */ > + pp->initialized = INIT_FAILED; > + return 0; > + } > } > > /* > -- > 2.19.0 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel