Re: [PATCH 05/15] multipath-tools tests: fix up directio tests

Martin Wilck <martin.wilck@xxxxxxxx> · Thu, 05 Sep 2024 09:57:20 +0200

On Wed, 2024-09-04 at 18:53 -0400, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2024 at 09:43:59PM +0200, Martin Wilck wrote:
> > 
> > But some issues remain, e.g.
> > 
> > https://github.com/openSUSE/multipath-tools/actions/runs/10708349169/job/29690448105
> 
> I'm pretty sure that due to valgrind and virtual machine induced
> delays,
> we end up waiting more than 1ms in test_check_state_async() between
> starting the checker at
> 
> do_check_state(&c[256], 0, PATH_PENDING);
> 
> and calling libcheck_pending at
> 
> do_libcheck_pending(&c[256], PATH_UP);
> 
> This means that we will only call get_events() once, and we won't get
> the IO for the c[256] which the test returns on the second call to
> get_events(). This would cause the error from the github CI runs (I
> haven't been able to reproduce this myself locally, but I haven't
> tried
> on an Ubuntu VM):
> 
> [ RUN      ] test_check_state_async
> [  ERROR   ] --- 0x6 != 0x3
> [   LINE   ] --- directio.c:237: error: Failure!
> [  FAILED  ] test_check_state_async
> 
> Since the time it takes the test program to run is out of our hands
> and
> the checker wait time isn't configurable, I'm not sure that we can
> guarantee that this test will always run correctly while testing this
> code path without being a little hacky and manually bumping up
> ct->endtime so that we're sure it hasn't already passed when we call
> libcheck_pending().

Thanks for having a look. What you write makes sense.

Whatever we do, we will either need to disable the test, or find a way
to fine-tune the timeout such that the CI succeeds (most of the time,
at least). Constantly failing CI is no CI at all.

> Obviously if we took your route and did the waiting outside of
> libcheck_pending(), then this code path wouldn't exist and the
> problem
> would go away. I'll think on this a bit.

Thanks!

Martin