Re: [PATCH 03/13] multipathd: allow map removal in do_sync_mpp()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 10, 2024 at 10:05:14PM +0100, Martin Wilck wrote:
> On Tue, 2024-12-10 at 14:02 -0500, Benjamin Marzinski wrote:
> > On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote:
> > > We previously didn't allow map removal inside the checker loop. But
> > > with the late updates to the checkerloop code, it should be safe to
> > > orphan
> > > paths and delete maps even in this situation. We remove such maps
> > > everywhere
> > > else in the code already, whenever refresh_multipath() or
> > > setup_multipath()
> > > is called.
> > 
> > I don't think that this is safe. It's possible that the multipath
> > device
> > has paths in the INIT_REMOVED or INIT_PARTIAL state. These will get
> > silently removed from the pathvec if we remove the map here. This
> > will
> > mess up our iteration through the pathvec in update_paths(). 
> 
> Hm. You're right. But that applies to the current code in 0.11.0 PR as
> well, because we'd call 
> 
>    do_sync_mpp()
>       update_multipath_strings() 
>          sync_paths()
>             check_removed_paths()
>                vector_del_slot(pathvec, i--);
> 
> Or am I missing something?

Nope. Your right. Nuts.
 
> It seems to me that the only safe way to handle this is to refrain from
> deleting paths from the pathvec anywhere deep in the call stack. Even
> if we can avoid this situation now by moving the sync towards the end
> of the checker loop, I believe that in the long run we need to fix
> these traps in our code, because it's just so easy to get this wrong.
> 
> I wonder if we need yet another path state, of if we could simply set
> these entries in the pathvec to NULL. That sounds crazy, but it might
> actually be doable. Not 0.11.0 material, though.

I think we could just not call check_removed_paths() in sync_paths(). We
would still orphan all the paths that were no longer part of the
multipath device, and set pp->mpp for all the paths that are part of the
device just like before, but we wouldn't delete the paths from
pathvec there. Instead we would call check_removed_paths() in
refresh_multipath(), so we did it after loads and in update_multipath.

I'm pretty sure that should be fine. If the device table changed and
removed a path so that we can free it, either we reloaded the device,
and we will call setup_multipath() after the reload, or something
external did, and multipathd will see an event for that and call
setup_multipath() via update_multipath().

Does that make sense?

> > Perhaps a
> > better idea would be to set mpp->sync_ticks to 0 if
> > update_multipath_strings() fails in do_sync_mpp(). This would force a
> > refresh by sync_mpp() at the start of the next loop in checkerloop(),
> > where it can safely remove the multipath device.
> 
> I like the idea of your other post to move the sync to the
> CHECKER_FINISHED state.
> 
> Thanks,
> Martin
> 





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux