Re: [PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls

Martin Wilck <martin.wilck@xxxxxxxx> · Thu, 05 Sep 2024 09:53:52 +0200

On Wed, 2024-09-04 at 17:17 -0400, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2024 at 10:05:30PM +0200, Martin Wilck wrote:
> > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > > When the tur and directio checkers start an asynchronous checker,
> > > they
> > > now immediately return with the path in PATH_PENDING, instead of
> > > waiting in checker_check(). Instead the wait now happens in
> > > checker_get_state().
> > > 
> > > Additionally, the directio checker now waits for 1 ms, like the
> > > tur
> > > checker does. Also like the tur checker it now only waits once.
> > > If it
> > > is still pending after the first call to checker_get_state(). It
> > > will
> > > not wait at all on future calls, and will just process the
> > > already
> > > completed IOs.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> > 
> > Now I understand what you're doing, but I find it somewhat
> > confusing. I
> > believe this can be simplified after the split between starting the
> > checkers and looking at their result.
> > 
> > Why not just fire off the checkers, wait 1ms for _all_ the just
> > started
> > checkers in checkerloop() [*], and then do the pending test like it
> > is
> > now (but just in polling mode, with no c->timeout)?
> 
> That's probably possible. You would also need to put that wait in
> pathinfo().

pathinfo() is tricky because it's written with the assumption that the
result is obtained synchronously. Maybe in pathinfo we should just fire
up the checker and set the status to PENDING. But we might as well not
run the checker at all, just set the checker interval to minimum, and
let the checkers do their work. (I like that idea, actually). 

Another idea would be to define an exception for pathinfo(), and
actually wait for completion there, not until c->timeout expires, but
somewhat longer than we currently do.

> The big reason I did it this way was so nothing outside of the
> checker
> itself needed to care about things like if the checker was set to
> async
> and whether or not the checker was actually async capable and if this
> was a newly started async check or if the checker had already been
> pending for a cycle, where waiting another 1ms was pointless.

Right, the sync property should be hidden. If sync checkers are in use,
none of the recent changes will have much of a positive effect. I'd
like to convert all checkers to async mode. Just need some time to do
it :-/ AFAICS, the only sync checker that still matters is rdac.

Below is some brainstorming from my side. 
Ignore it if it makes no sense.

We currently use the path state, in particular PATH_PENDING, to infer
the state of the checker (thread, aio request), but that doesn't give
us the full picture.

We should be able to distinguish the path state and the state of the
checker itself, which can be IDLE, RUNNING, or DONE. The checker
provides an API checker_state() (I can't think of a better name) to
retrieve its state. The path state can (only) be retrieved in DONE
checker state. When the path state is retrieved, the checker state
switches to IDLE. checker_start() switches from IDLE to RUNNING.
checker_wait(timeout) waits until the RUNNING checker reaches DONE, or
the passed timeout is reached. A timed-out checker
(wrt c->timeout) has state DONE with path state PATH_TIMEOUT.

Sync checkers would have a degenerated API where checker_start() and
checker_state() would be noops.

> This way, you just call checker_get_state() on all the paths were you
> ran the path checker, and if at least one of them needs to wait, then
> the first one that does will wait, and the rest won't. If none of
> them
> need to wait, then none of them will.

This makes sense. But the "wait" you're talking about is the 1ms delay,
after which the paths may well still be in PATH_PENDING. It's actually
not much different from not waiting at all, we've just decreased the
likelihood of PATH_PENDIN. 1ms is just an estimate, and, in my
experience, too short in many environments.

The real benefit of your patch set is that we now start all (async)
checkers in parallel. Which means that we can wait longer, increasing
the likelihood that the checkers have actually finished, while spending
less total time waiting. "Waiting longer" is a bit hand-waving; IMO it
should be handled further up in the stack rather than down in the
checker code.

We could achieve the behavior you describe in checkerloop, like this:

 - record start_timestamp
 - fire off all handlers
 - calculate end_time as start_timestamp + some_delay 
 - use that end_time for waiting for results of all fired-off checkers

Like in your example, we'd really wait for no more than 1 path.
We could do this for the entire set of paths, or for each multipath map
separately.

Regards
Martin