Re: [PATCH 03/13] multipathd: allow map removal in do_sync_mpp()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2024-12-10 at 17:49 -0500, Benjamin Marzinski wrote:
> On Tue, Dec 10, 2024 at 10:05:14PM +0100, Martin Wilck wrote:
> > On Tue, 2024-12-10 at 14:02 -0500, Benjamin Marzinski wrote:
> > > On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote:
> > > > We previously didn't allow map removal inside the checker loop.
> > > > But
> > > > with the late updates to the checkerloop code, it should be
> > > > safe to
> > > > orphan
> > > > paths and delete maps even in this situation. We remove such
> > > > maps
> > > > everywhere
> > > > else in the code already, whenever refresh_multipath() or
> > > > setup_multipath()
> > > > is called.
> > > 
> > > I don't think that this is safe. It's possible that the multipath
> > > device
> > > has paths in the INIT_REMOVED or INIT_PARTIAL state. These will
> > > get
> > > silently removed from the pathvec if we remove the map here. This
> > > will
> > > mess up our iteration through the pathvec in update_paths(). 
> > 
> > Hm. You're right. But that applies to the current code in 0.11.0 PR
> > as
> > well, because we'd call 
> > 
> >    do_sync_mpp()
> >       update_multipath_strings() 
> >          sync_paths()
> >             check_removed_paths()
> >                vector_del_slot(pathvec, i--);
> > 
> > Or am I missing something?
> 
> Nope. Your right. Nuts.
>  
> > It seems to me that the only safe way to handle this is to refrain
> > from
> > deleting paths from the pathvec anywhere deep in the call stack.
> > Even
> > if we can avoid this situation now by moving the sync towards the
> > end
> > of the checker loop, I believe that in the long run we need to fix
> > these traps in our code, because it's just so easy to get this
> > wrong.
> > 
> > I wonder if we need yet another path state, of if we could simply
> > set
> > these entries in the pathvec to NULL. That sounds crazy, but it
> > might
> > actually be doable. Not 0.11.0 material, though.
> 
> I think we could just not call check_removed_paths() in sync_paths().
> We
> would still orphan all the paths that were no longer part of the
> multipath device, and set pp->mpp for all the paths that are part of
> the
> device just like before, but we wouldn't delete the paths from
> pathvec there. Instead we would call check_removed_paths() in
> refresh_multipath(), so we did it after loads and in
> update_multipath.
> 
> I'm pretty sure that should be fine. If the device table changed and
> removed a path so that we can free it, either we reloaded the device,
> and we will call setup_multipath() after the reload, or something
> external did, and multipathd will see an event for that and call
> setup_multipath() via update_multipath().
> 
> Does that make sense?

I think we'll be fine with my upcoming patch set, which will call
reload_and_sync_map() only from checker_finished(). I don't change
sync_paths() in this set so far.

I'm a little concerned about refresh_multipath() and
reload_and_sync_map() being called from various CLI functions. But I
won't start digging into that now. Maps may get removed in CLI calls,
so what.

Thanks,
Martin






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux