Re: [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2018-10-23 at 16:21 -0500,  Benjamin Marzinski  wrote:
> On Tue, Oct 23, 2018 at 10:49:49PM +0200, Martin Wilck wrote:
> > On Tue, 2018-10-23 at 14:28 -0500,  Benjamin Marzinski  wrote:
> > 
> > > On Tue, Oct 23, 2018 at 03:43:45PM +0200, Martin Wilck wrote:
> > > > When the tur checker code determines that a hanging TUR thread
> > > > couldn't be cancelled, rather than simply returning, reallocate
> > > > the checker context and start a new thread. This will leak some
> > > > memory if the hanging thread never wakes up again, but well, in
> > > > that highly unlikely case we're leaking threads anyway.
> > > 
> > > The thing about PATH_UNCHECKED is that we do mark the path as
> > > failed.
> > > We just don't tell device-mapper. If we get PATH_UNCHECKED in
> > > pathinfo(), we set the state to PATH_DOWN. If we get a
> > > PATH_UNCHECKED
> > > in
> > > check_path(), we immediately call pathinfo(), 
> > 
> > But we call pathinfo(pp, conf, 0) there, i.e. all DI flags unset.
> > This
> > comes down to a (partial) blacklist check (which is ignored) and a
> > call
> > to path_offline(). pp->state isn't touched in this code path. It's
> > more
> > or less a NOOP. (BTW, the purpose of this pathinfo() call remains
> > obscure to me. It goes back to the ancient commit 95987395).
> 
> Oops. I overlooked the flags. Well, that takes care of any issue with
> check_path(). As for the pathinfo call itself, I assume that, at some
> tine, it was possible to get to check_path() without a fully
> initialized
> path device.  But that doesn't explain why it doesn't set any flags.
> I'm not sure if this code currently serves any purpose.
>  
> > > making it likely that we
> > > will get PATH_UNCHECKED there as well. The consequence of this is
> > > that
> > > if the path later changes to PATH_DOWN, which seems likely, we
> > > still
> > > won't tell device-mapper, since as far as multipathd is
> > > concerned,
> > > the
> > > path hasn't changed state. 
> > 
> > I don't follow you. check_path() quits early when PATH_UNCHECKED is
> > encountered, and doesn't alter pp->state. It will check again in
> > the
> > next round, and if the path switches to DOWN then (or any other
> > state,
> > for that matter) it will treat it right, AFAICS.
> 
> If PATH_UNCHECKED triggers the "blank" code in pathinfo, it will set
> the
> path's state to PATH_DOWN.
> 
> > >  Looking at most of the code, the way we
> > > treat PATH_UNCHECKED really only makes sense when we use it to
> > > mean
> > > we
> > > haven't ever gotten a result from get_state() before.
> > > 
> > > If you want a return code that does just does nothing with the
> > > path,
> > > except wait, that's PATH_PENDING. It leaves the paths state
> > > exactly
> > > the
> > > same as before. The only issue there is that we schedule another
> > > path
> > > checker for a second later, which might not be the right answer
> > > to an
> > > out-of-memory issue.
> > > 
> > > If you've reviewed the code paths that we follow on
> > > PATH_UNCHECKED,
> > > and
> > > still feel that it is the right answer, I won't block it, because
> > > this
> > > is a pretty remote corner case.
> > 
> > PATH_UNCHECKED is tested in the following places (ignoring calls
> > from
> > multipath and mpathpersist):
> > 
> >  1) pathinfo(): in the "blank" case (we discussed that before, I
> > think
> > it's wrong and I removed that test in 17/21 of my previous 21-part
> > series).
> 
> removing the blank case here fixes the issue with getting a
> PATH_UNCHECKED in pathinfo(), which is the root of my objection to
> it,
> so
> 
> Reviewed-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>

Thanks. 

Side note, FTR: before the checker notices that the thread is stale, it
sees a TUR timeout and thus returns PATH_TIMEOUT, which multipathd
treats like PATH_DOWN. The way the patch is now, it just doesn't return
PATH_TIMEOUT *again* in the next checker round if it notices that
cancellation failed.

Martin

-- 
Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux