Re: [PATCH 5/5] libmultipath: don't dlclose tur checker DSO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2020-11-02 at 12:27 -0600, Benjamin Marzinski wrote:
> On Fri, Oct 30, 2020 at 09:15:39PM +0000, Martin Wilck wrote:
> > On Fri, 2020-10-23 at 16:15 -0500, Benjamin Marzinski wrote:
> > > The multipathd tur checker thread is designed to be able to
> > > finish at
> > > any time, even after the tur checker itself has been freed. The
> > > multipathd shutdown code makes sure all the checkers have been
> > > freed
> > > before freeing the checker_class and calling dlclose() to unload
> > > the
> > > DSO, but this doesn't guarantee that the checker threads have
> > > finished.
> > > If one hasn't, the DSO will get unloaded while the thread still
> > > running
> > > code from it, causing a segfault. Unfortunately, it's not
> > > possible to
> > > be
> > > sure that all tur checker threads have ended during shutdown,
> > > without
> > > making them joinable.
> > > 
> > > However, since libmultipath will never be reinitialized after it
> > > has
> > > been uninitialzed, not dlclosing the tur checker DSO once a
> > > thread is
> > > started has minimal cost (keeping the DSO code around until the
> > > program
> > > exits, which usually happens right after freeing the checkers).
> > 
> > I'm not against this, but have you considered using an
> > atomic  refcount
> > for the DSO? With every tur thread starting, we could increase it,
> > and
> > decrease it in the cleanup function of the thread when it exits.
> > That
> > should be safe. If the refcount was positive when we exit, we could
> > refrain from unloading the DSO.
> 
> I tried exactly that, and I would still get crashes, even if it put
> the
> code that decrements the atomic variable in a function that's not
> part
> of the DSO, and put a pthread_exit() before the final
> pthread_cleanup_pop() that would decrement it in tur_thread, so that
> after the cleanup code is called the thread should never return to
> code
> that is in the DSO. I had to add sleeps in code to force various
> orderings, but I couldn't find any way that worked for all possible
> orderings.  I would love it if this worked, and you're free to try,
> but
> I could not get this method to work.

I've experimented a bit with a trivial test program, and I found that
it worked stably if decrementing the refcount is really the last thing
thread's cleanup function does. Could you provide some details about
the sleeps that you'd put in that made this approach fail?

Martin

-- 
Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer



--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux