Re: [PATCH v2 4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry

Benjamin Marzinski <bmarzins@xxxxxxxxxx> · Thu, 2 May 2024 12:05:36 -0400

On Thu, May 02, 2024 at 05:14:56PM +0200, Martin Wilck wrote:
> On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > Currently multipath makes sure that dev_loss_tmo is at least as long
> > as
> > the configured no path queueing time. The goal is to make sure that
> > path
> > devices aren't removed while the multipath device is still queueing
> > in
> > hopes that they will become usable again.
> > 
> > This is racy. Multipathd may take longer to check the paths than
> > configured. If strict_timing isn't set, it will definitely take
> > longer.
> > To account for this, pad the minimum dev_loss_tmo value by five
> > seconds
> > (one default checker interval) plus one second per minute of no path
> > queueing time.
> 
> What's the rationale for the "one second per minute" part?
> It feels kind of arbitrary to me, and I don't find it obvious that we
> need larger padding for larger timeouts.

The way the checker works, without strict_timing, we do all the checking
work, and then we wait for a second. Obviously, the checking work takes
time, so every nominal one second checker loop will in reality take
longer than a second. The larger you have configured (no_path_retry *
checkint), the further away the actual time when we disable queueing
will be from (no_path_retry * checkint) seconds.

With strict_timing on, things work better. As long as the checking work
doesn't take more than a second, you will stay fairly close to one
second per loop, but even still, it can be longer. nanosleep() will
sleep till at least the time specified. If multipathd is not running as
a realtime process, it will usually sleep for longer.

Also, if the checker work takes longer than a second, then strict_timing
will make sure that the loop takes (close to) exactly 2 (or whatever the
next number is) seconds. However, retry_count_tick() still only ticks
down by 1 for each loop. This means that the real time before queueing
is disabled can diverge from (no_path_retry * checkint). And again, the
longer you've configured it to wait, the more it diverges.

I thought about passing ticks to retry_count_tick(), so that it would
correctly deal with these multi-tick loops, but this can cause it to
diverge from the actual number of retries attempted, in a way that would
mean we disable queueing before the desired number of retries have
occurred, which is a bigger problem, IMHO.  No matter how long the
checker work takes, you will only ever do one path check per loop. In
the worse case, say that the checker work (or anything else in
multipathd) gets stuck for a couple of minutes. You would only do one
path check, but hundreds of ticks would be passed to the
retry_count_tick(), meaning that you could disable queueing after only
one retry. Now you could limit the number of ticks you passed to
retry_count_tick(), but even still, if the a path was 1 tick away from
being checked, and the checker work took 3 seconds, you would still pass
3 ticks to retry_count_tick(), which would cause it to start to diverge
from the actual time it takes for the path to do the configured amount
of retries.

We could improve retry_count_tick(), so that it actually tracks the
number of retries you did, but that still leaves the problems when
either strict_timing isn't on or multipathd isn't a realtime process.
It seemed easier to just say, the longer you have configured multipathd
to wait to disable queueing, the more the time when it actually disables
queueing can diverge from the configured time.

The number I picked for the rate of divergence is just a guest, and if
you don't like it, I'm fine with removing it. After all, the whole point
of this patchset is to make sure that the worst thing that can happen if
paths disappear while we are still queueing is that a multipath device
isn't autoremoved.

-Ben

> 
> Regards
> Martin