On Fri, 2018-10-05 at 14:38 -0500, Benjamin Marzinski wrote: > On Wed, Oct 03, 2018 at 12:37:17AM +0200, Martin Wilck wrote: > > Hi Ben, > > > > > I am uncertain about this one. The old code sets pp->initialized > > > to > > > INIT_FAILED. If the state had been INIT_MISSING_UDEV or > > > INIT_REQUESTED_UDEV before, this patch might change how the code > > > behaves later in check_path(), where these conditions are > > > checked. > > > > > > Likewise, tests for strlen(pp->wwid) are used in various places > > > around > > > the code. These tests would now yield different results for paths > > > in > > > "recoverable error" state. > > > > > > Have you considered these possible side effects? > > > > I've pondered over this a lot. The dust is clearing up a bit. > > > > 1. With your patch in place, INIT_FAILED is never set except in > > alloc_path() (we might rename it to INIT_NEW or the like, but see > > below). > > Correct. The idea is that INIT_FAILED means that the path has never > been > fully initialized and that it is missing more than simply the wwid > (which would set INIT_MISSING_UDEV instead). > > > 2. I don't understand how you handle repeated failure to retrieve > > the > > WWID. I see that get_uid() (actually, scsi_uid_fallback()) would > > retrieve the WWID from sysfs after retriggers are exhausted. But I > > don't see how pathinfo(DI_WWID) would ever be called in this > > situation: > > > > In the last invocation, pathinfo() had failed to retrieve the WWID > > and > > set pp->initialized = INIT_MISSING_UDEV. There it will remain > > because > > check_path() won't set it to INIT_REQUESTED_UDEV any more after > > retries > > are exhausted. And now, check_path() won't call pathinfo(DI_ALL) > > any > > more from the "add missing path" code, because of the (pp- > > >initialized > > != INIT_MISSING_UDEV) condition. > > > > Am I overlooking something? > > Your analysis looks right. This shouldn't have anything to do with > the > correctness of the not blanking initialized paths, but we should fix > it > anyway. Right now, as you mentioned, if multipathd fails to get the > wwid after retrigger_tries new uevents, it only has one shot at the > failback methods, and then mutipathd cannot use that path in the > future. > However, it will still check if the path is up or down, which is > pretty > pointless. It should either > > 1. set pp->intialized to INIT_OK, to keep it from being checked at > all > in the future, or That would be lying to ourselves. I'd rather call it INIT_BROKEN or so, or maybe even delete the path from pathvec altogether. [ While we're at this: we call pathinfo with DI_ALL from configure->path_discovery(), but with DI_ALL|DI_BLACKLIST from uev_add_path(). As a result, paths discovered in uevents won't be added to pathvec in the first place if they're blacklisted, while blacklisted paths discovered at startup will be in the pathvec, but orphaned. Is that intentional, and if yes, why ? ] > 2. try to grab the missing pathinfo, so that it can use the path. > > The benefit of doing the first is that mutipathd doesn't keep messing > with paths that really aren't set up to be multipathed. On the other > hand it might ignore some paths that really should be multipathed. > perhaps the best idea is to keep trying to get the missing info, but > just increase pp->tick (possibly progressively) so we try less often. The latter sounds overly complex to me, in particular as in the daemon, we're only looking at paths retrieved via udev anyway, as you pointed out in your other post. So perhaps INIT_MISSING_UDEV is just about the right state for this. We may just consider changing uev_update_path() such that the path is passed to uev_add_path() not only for INIT_REQUESTED_UDEV, but also for INIT_MISSING_UDEV state. > > > 3. If "blank" state means that important device information > > couldn't be > > retrieved because of presumably transient failure conditions, we > > should > > retry to retrieve this information by calling pathinfo again later. > > But > > unless the WWID is (reset to) the empty string, check_path() won't > > call > > pathinfo(DI_ALL) any more. > > But if we set the wwid at some point in time, that means we got all > the > necessary information. My argument is that if something happens to > cause a later pathinfo() call to fail, we shouldn't pretend like we > didn't already get this info. The question is whether the problem encountered means that the earlier determined WWID is still reliable. If the path is entirely gone from sysfs (case a) below), I wouldn't bet on that. In this case we rely on the assumption that we'll receive an uevent later - likely but not guaranteed. OTOH, by blanking the WWID, we forfeit the opportunity to detect a WWID change later - with that in mind, blanking is actually wrong in this situation, and your patch should be an improvement. > > > 4. The "blank" logic in pathinfo() combines several very different > > cases. > > a) PATH_REMOVED status from path_offline(). This means that > > elementary sysfs attributes were missing. This is almost the same > > as > > failure in sysfs_pathinfo(), which results in PATHINFO_FAILED > > return > > status; but for PATH_REMOVED we return PATHINFO_OK and keep the > > path > > around. > > True, but I've always assumed that this means that we will be getting > a > uevent to remove this path momentarily, so I doesn't really matter > what > we do with it. We could blank the path here, but if we already go the > necessary info, it seems like just setting the path to PATH_DOWN, and > waiting for the remove uevent should be fine. Perhaps. I just wanted to point out the different treatment of two very similar scenarios. > > b) Failure in checker_check(). If the path is offline in the > > first > > place, the checker isn't called, and WWID determination is > > attempted. > > But if the checker returns PATH_UNCHECKED or PATH_WILD, we goto > > "blank" > > state. > > This is clearly the wrong thing to do. > > > c) Failure in scsi_ioctl_pathinfo() or cciss_ioctl_pathinfo(). > > Both > > functions never fail, so this can't happen. I've patches here to > > fix > > that. > > I just ignored this state because it was impossible. But even with > your > patches, my argument remains the same. If we already got this info, > failing to get it again shouldn't cause us to delete the wwid. We > should > probably remember the existing values, to make sure that we don't > lose > them if the calls fail for some reason. My patches wouldn't change a thing here. They just turn these two functions to void type. > > > d) Failure to open pp->fd. > > This is clearly the right thing to do. > > > d) is the only case in which the "blank" logic makes really sense > > to > > me. It can happen only at the first pathinfo() invocation, meaning > > pp->wwid is still empty, and pp->initialized is INIT_FAILED. Your > > patch > > would change nothing for this case. > > > > a) and b) can happen for paths that have been initialized already. > > I > > think in case a) the WWID should be reset, probably initialized > > should > > be set to INIT_FAILED, and PATHINFO_FAILED should be returned. In > > case > > b) we should IMO proceed normally rather than goto "blank". > > Resetting > > the WWID in case b) is nonsense, agreed. > > > > Altogether, if my analysis is correct, your patch (not blanking the > > WWID) should be applied to case b) only. > > I don't really care about blanking the wwid in case a). I just think > it's unnecessary. Why do you think we should blank it in case c) if > we > have previously gotten those values? Summarizing, a) you convinced me about that one, b) your patch is good, but we may need additional changes in a follow- up patch, c) doesn't happen, d) I agreed in the first place. And my concern about INIT_MISSING_UDEV was orthogonal to your patch anyway. So I'll give your patch a reviewed-by. Sorry for the noise. I guess you'll concede the handling of path states in multipath-tools is non-trivial. Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel