On Wed, 2024-09-04 at 18:53 -0400, Benjamin Marzinski wrote: > On Wed, Sep 04, 2024 at 09:43:59PM +0200, Martin Wilck wrote: > > > > But some issues remain, e.g. > > > > https://github.com/openSUSE/multipath-tools/actions/runs/10708349169/job/29690448105 > > I'm pretty sure that due to valgrind and virtual machine induced > delays, > we end up waiting more than 1ms in test_check_state_async() between > starting the checker at > > do_check_state(&c[256], 0, PATH_PENDING); > > and calling libcheck_pending at > > do_libcheck_pending(&c[256], PATH_UP); > > This means that we will only call get_events() once, and we won't get > the IO for the c[256] which the test returns on the second call to > get_events(). This would cause the error from the github CI runs (I > haven't been able to reproduce this myself locally, but I haven't > tried > on an Ubuntu VM): > > [ RUN ] test_check_state_async > [ ERROR ] --- 0x6 != 0x3 > [ LINE ] --- directio.c:237: error: Failure! > [ FAILED ] test_check_state_async > > Since the time it takes the test program to run is out of our hands > and > the checker wait time isn't configurable, I'm not sure that we can > guarantee that this test will always run correctly while testing this > code path without being a little hacky and manually bumping up > ct->endtime so that we're sure it hasn't already passed when we call > libcheck_pending(). Thanks for having a look. What you write makes sense. Whatever we do, we will either need to disable the test, or find a way to fine-tune the timeout such that the CI succeeds (most of the time, at least). Constantly failing CI is no CI at all. > Obviously if we took your route and did the waiting outside of > libcheck_pending(), then this code path wouldn't exist and the > problem > would go away. I'll think on this a bit. Thanks! Martin