Re: [PATCH v2 03/14] multipathd: sync maps at end of checkerloop

Benjamin Marzinski <bmarzins@xxxxxxxxxx> · Wed, 15 Jan 2025 11:45:35 -0500

On Tue, Jan 14, 2025 at 10:36:06PM +0100, Martin Wilck wrote:
> On Thu, 2024-12-19 at 18:04 -0500, Benjamin Marzinski wrote:
> > On Wed, Dec 11, 2024 at 11:58:58PM +0100, Martin Wilck wrote:
> > > @@ -3032,11 +3026,13 @@ checkerloop (void *ap)
> > >  							    
> > > start_time.tv_sec);
> > >  			if (checker_state == CHECKER_FINISHED) {
> > >  				vector_foreach_slot(vecs->mpvec,
> > > mpp, i) {
> > > -					if ((update_mpp_prio(mpp)
> > > ||
> > > -					     (mpp->need_reload &&
> > > mpp->synced_count > 0)) &&
> > 
> > When you call reload_and_sync_map(), it will automatically resync the
> > map via setup_multipath() -> refresh_multipath() ->
> > update_multipath_strings().
> > 
> > This means that if for some reason multipathd and the kernel disagree
> > about a map, and reloading it doesn't fix the problem, you will
> > immediately set mpp->need_reload again. With the old mpp-
> > >synced_count
> > check, you only reload maps with need_reload() when a path is
> > checked.
> > Without this check, or a (mpp->checker_count > 0) check to replace
> > it,
> > you will keep reloading these maps every loop, roughly once a second.
> > I
> > would rather not do this.
> > 
> > If you want to make sure to immediately handle a need_reload that
> > wasn't
> > triggered by this call to reload_and_sync_map() which was because of
> > an
> > earlier need_reload, we could make need_reload have three states, to
> > distinguish between a reload we want done immediately, and one we
> > would
> > like to wait on because we just did a reload and it didn't fix the
> > problem. Then we could remember if need_reload was set before calling
> > reload_and_sync_map(), and if it was, and if it is still set after,
> > we
> > could switch it to the delayed version.
> > 
> > Or perhaps I'm just being paranoid here. 
> 
> As you probably know and as I recently verified, reloading the kernel
> from the checker loop will hardly ever fail except with -ENOMEM [1]. We
> can pass non-existing or failed devices to the kernel, it will happily
> accept them.
> 
> update_pathvec_from_dm() never adds any devices to the map, it just
> removes some. If need_reload is set, it means that it has removed
> either a path or an entire pathgroup. When the map is reloaded, it will
> only reference (a subset of) the devices that were already mapped. I
> see no way how this could fail unless either multipathd or the kernel
> are really badly malfunctioning, in which case we don't need to bother
> about reloading too frequently.
> 
> But *if* the reload succeeds, the set of devices in the kernel is
> guaranteed to be identical to the table that we've just used for
> reloading. So only way that another difference between kernel and
> multipath state could occur between the reload and
> update_pathvec_from_dm() running again is that another device has just
> diappeared from the system, in which case a quick reload would be a
> reasonable action. (Well I guess another possibility would be a 3rd
> party maliciously adding wrong path devices to the maps we maintain,
> but that's not something we can do much about).
> 
> If need_reload is indeed set again in this situation, I would indeed
> prefer to double-check this map quickly. As argued above, I strongly
> believe that such a situation will not persist. IMO a detected
> inconsistency between the kernel and multipathd is a very bad thing
> that we should try to fix rather sooner than later. It's at least as
> bad as a failed path, which we'll check every second, too.
> 
> Bottom line: I think re-checking this quickly is actually the right
> thing to do. Would you accept this if I add a warning in the
> "inconsistent" case, so that in the event that we actually run into a
> persistent discrepancy situation, we will notice?
> 

Sure. This is probably just my paranoia here, since I can't actually
come up with a concrete case where there would be a persistent
discrepancy. If it ever happens, it's almost definitely a bug in the
multipath code.

-Ben

> Regards,
> Martin
> 
> [1]
> https://lore.kernel.org/dm-devel/ee6fcbda31fd1f13969653782417fbed748f5bc7.camel@xxxxxxxx/
> 
> > -Ben
> > 
> > > -					   
> > > reload_and_sync_map(mpp, vecs) == 2)
> > > +					sync_mpp(vecs, mpp,
> > > ticks);
> > > +					if ((update_mpp_prio(mpp)
> > > || mpp->need_reload) &&
> > > +					   
> > > reload_and_sync_map(mpp, vecs) == 2) {
> > >  						/* multipath
> > > device deleted */
> > >  						i--;
> > > +						continue;
> > > +					}
> > >  				}
> > >  			}
> > >  			lock_cleanup_pop(vecs->lock);
> > > -- 
> > > 2.47.0
> >