Hi Ben, On Tue, 2018-10-02 at 00:00 +0200, Martin Wilck wrote: > On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote: > > When pathinfo fails for some likely transient reason, it clears the > > path > > wwid, but otherwise returns successfully, to keep the path around > > but > > not usable until it gets fully initialized. However, if the path > > has > > already been initialized, and pathinfo hits a transient error, it > > shouldn't clear the wwid. > > > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> > > --- > > libmultipath/discovery.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c > > index 3e0db7f..33815dc 100644 > > --- a/libmultipath/discovery.c > > +++ b/libmultipath/discovery.c > > @@ -1991,9 +1991,9 @@ blank: > > /* > > * Recoverable error, for example faulty or offline path > > */ > > - memset(pp->wwid, 0, WWID_SIZE); > > pp->chkrstate = pp->state = PATH_DOWN; > > - pp->initialized = INIT_FAILED; > > + if (pp->initialized == INIT_FAILED) > > + memset(pp->wwid, 0, WWID_SIZE); > > > > return PATHINFO_OK; > > } > > I am uncertain about this one. The old code sets pp->initialized to > INIT_FAILED. If the state had been INIT_MISSING_UDEV or > INIT_REQUESTED_UDEV before, this patch might change how the code > behaves later in check_path(), where these conditions are checked. > > Likewise, tests for strlen(pp->wwid) are used in various places > around > the code. These tests would now yield different results for paths in > "recoverable error" state. > > Have you considered these possible side effects? I've pondered over this a lot. The dust is clearing up a bit. 1. With your patch in place, INIT_FAILED is never set except in alloc_path() (we might rename it to INIT_NEW or the like, but see below). 2. I don't understand how you handle repeated failure to retrieve the WWID. I see that get_uid() (actually, scsi_uid_fallback()) would retrieve the WWID from sysfs after retriggers are exhausted. But I don't see how pathinfo(DI_WWID) would ever be called in this situation: In the last invocation, pathinfo() had failed to retrieve the WWID and set pp->initialized = INIT_MISSING_UDEV. There it will remain because check_path() won't set it to INIT_REQUESTED_UDEV any more after retries are exhausted. And now, check_path() won't call pathinfo(DI_ALL) any more from the "add missing path" code, because of the (pp->initialized != INIT_MISSING_UDEV) condition. Am I overlooking something? 3. If "blank" state means that important device information couldn't be retrieved because of presumably transient failure conditions, we should retry to retrieve this information by calling pathinfo again later. But unless the WWID is (reset to) the empty string, check_path() won't call pathinfo(DI_ALL) any more. 4. The "blank" logic in pathinfo() combines several very different cases. a) PATH_REMOVED status from path_offline(). This means that elementary sysfs attributes were missing. This is almost the same as failure in sysfs_pathinfo(), which results in PATHINFO_FAILED return status; but for PATH_REMOVED we return PATHINFO_OK and keep the path around. b) Failure in checker_check(). If the path is offline in the first place, the checker isn't called, and WWID determination is attempted. But if the checker returns PATH_UNCHECKED or PATH_WILD, we goto "blank" state. c) Failure in scsi_ioctl_pathinfo() or cciss_ioctl_pathinfo(). Both functions never fail, so this can't happen. I've patches here to fix that. d) Failure to open pp->fd. d) is the only case in which the "blank" logic makes really sense to me. It can happen only at the first pathinfo() invocation, meaning pp->wwid is still empty, and pp->initialized is INIT_FAILED. Your patch would change nothing for this case. a) and b) can happen for paths that have been initialized already. I think in case a) the WWID should be reset, probably initialized should be set to INIT_FAILED, and PATHINFO_FAILED should be returned. In case b) we should IMO proceed normally rather than goto "blank". Resetting the WWID in case b) is nonsense, agreed. Altogether, if my analysis is correct, your patch (not blanking the WWID) should be applied to case b) only. Please comment - I still feel a bit confused and may have overlooked something essential. Regards Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel