Re: [PATCH v2 4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry

Martin Wilck <martin.wilck@xxxxxxxx> · Thu, 02 May 2024 20:44:28 +0200

On Thu, 2024-05-02 at 12:05 -0400, Benjamin Marzinski wrote:
> On Thu, May 02, 2024 at 05:14:56PM +0200, Martin Wilck wrote:
> > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > > Currently multipath makes sure that dev_loss_tmo is at least as
> > > long
> > > as
> > > the configured no path queueing time. The goal is to make sure
> > > that
> > > path
> > > devices aren't removed while the multipath device is still
> > > queueing
> > > in
> > > hopes that they will become usable again.
> > > 
> > > This is racy. Multipathd may take longer to check the paths than
> > > configured. If strict_timing isn't set, it will definitely take
> > > longer.
> > > To account for this, pad the minimum dev_loss_tmo value by five
> > > seconds
> > > (one default checker interval) plus one second per minute of no
> > > path
> > > queueing time.
> > 
> > What's the rationale for the "one second per minute" part?
> > It feels kind of arbitrary to me, and I don't find it obvious that
> > we
> > need larger padding for larger timeouts.
> 
> The way the checker works, without strict_timing, we do all the
> checking
> work, and then we wait for a second. Obviously, the checking work
> takes
> time, so every nominal one second checker loop will in reality take
> longer than a second. The larger you have configured (no_path_retry *
> checkint), the further away the actual time when we disable queueing
> will be from (no_path_retry * checkint) seconds.
> 
> With strict_timing on, things work better. As long as the checking
> work
> doesn't take more than a second, you will stay fairly close to one
> second per loop, but even still, it can be longer. nanosleep() will
> sleep till at least the time specified. If multipathd is not running
> as
> a realtime process, it will usually sleep for longer.
> 
> Also, if the checker work takes longer than a second, then
> strict_timing
> will make sure that the loop takes (close to) exactly 2 (or whatever
> the
> next number is) seconds. However, retry_count_tick() still only ticks
> down by 1 for each loop. This means that the real time before
> queueing
> is disabled can diverge from (no_path_retry * checkint). And again,
> the
> longer you've configured it to wait, the more it diverges.
> 
> I thought about passing ticks to retry_count_tick(), so that it would
> correctly deal with these multi-tick loops, but this can cause it to
> diverge from the actual number of retries attempted, in a way that
> would
> mean we disable queueing before the desired number of retries have
> occurred, which is a bigger problem, IMHO.  No matter how long the
> checker work takes, you will only ever do one path check per loop. In
> the worse case, say that the checker work (or anything else in
> multipathd) gets stuck for a couple of minutes. You would only do one
> path check, but hundreds of ticks would be passed to the
> retry_count_tick(), meaning that you could disable queueing after
> only
> one retry. Now you could limit the number of ticks you passed to
> retry_count_tick(), but even still, if the a path was 1 tick away
> from
> being checked, and the checker work took 3 seconds, you would still
> pass
> 3 ticks to retry_count_tick(), which would cause it to start to
> diverge
> from the actual time it takes for the path to do the configured
> amount
> of retries.
> 
> We could improve retry_count_tick(), so that it actually tracks the
> number of retries you did, but that still leaves the problems when
> either strict_timing isn't on or multipathd isn't a realtime process.
> It seemed easier to just say, the longer you have configured
> multipathd
> to wait to disable queueing, the more the time when it actually
> disables
> queueing can diverge from the configured time.
> 
> The number I picked for the rate of divergence is just a guest, and
> if
> you don't like it, I'm fine with removing it. After all, the whole
> point
> of this patchset is to make sure that the worst thing that can happen
> if
> paths disappear while we are still queueing is that a multipath
> device
> isn't autoremoved.

I was just asking for a rationale. This is fine.

I think we can, and should, improve multipathd's timing. Unfortunately
this has been on my agenda for a long time already.

For 4/5, too:

Reviewed-by: Martin Wilck <mwilck@xxxxxxxx>