Re: [PATCH 05/15] multipath-tools tests: fix up directio tests

Benjamin Marzinski <bmarzins@xxxxxxxxxx> · Wed, 4 Sep 2024 18:53:36 -0400

On Wed, Sep 04, 2024 at 09:43:59PM +0200, Martin Wilck wrote:
> On Wed, 2024-09-04 at 21:36 +0200, Martin Wilck wrote:
> > On Wed, 2024-09-04 at 14:29 -0400, Benjamin Marzinski wrote:
> > > On Wed, Sep 04, 2024 at 06:12:37PM +0200, Martin Wilck wrote:
> > > > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > > > > Make the directio tests work with libcheck_pending() being
> > > > > separate
> > > > > from
> > > > > libcheck_check
> > > > > 
> > > > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> > > > 
> > > > There's still something wrong with this test. I'm seeing lots of
> > > > CI
> > > > errors with your complete series applied.
> > > > 
> > > > https://github.com/openSUSE/multipath-tools/actions?query=branch%3Atip
> > > > https://github.com/openSUSE/multipath-tools/actions/runs/10704501258/job/29677643779
> > > 
> > > It looks like your "tip" brach is missing:
> > > [PATCH 04/15] libmultipath: remove pending wait code from
> > > libcheck_check calls
> > 
> > Yeah. That patch ended up in a different mail folder, and I didn't
> > notice. Weird. CI looks much better now.
> 
> But some issues remain, e.g.
> 
> https://github.com/openSUSE/multipath-tools/actions/runs/10708349169/job/29690448105

I'm pretty sure that due to valgrind and virtual machine induced delays,
we end up waiting more than 1ms in test_check_state_async() between
starting the checker at

do_check_state(&c[256], 0, PATH_PENDING);

and calling libcheck_pending at

do_libcheck_pending(&c[256], PATH_UP);

This means that we will only call get_events() once, and we won't get
the IO for the c[256] which the test returns on the second call to
get_events(). This would cause the error from the github CI runs (I
haven't been able to reproduce this myself locally, but I haven't tried
on an Ubuntu VM):

[ RUN      ] test_check_state_async
[  ERROR   ] --- 0x6 != 0x3
[   LINE   ] --- directio.c:237: error: Failure!
[  FAILED  ] test_check_state_async

Since the time it takes the test program to run is out of our hands and
the checker wait time isn't configurable, I'm not sure that we can
guarantee that this test will always run correctly while testing this
code path without being a little hacky and manually bumping up
ct->endtime so that we're sure it hasn't already passed when we call
libcheck_pending().

Obviously if we took your route and did the waiting outside of
libcheck_pending(), then this code path wouldn't exist and the problem
would go away. I'll think on this a bit.

-Ben

> 
> Martin