On Thu, 2024-10-10 at 15:45 -0400, Benjamin Marzinski wrote: > On Wed, Oct 09, 2024 at 05:00:55PM +0200, Martin Wilck wrote: > > On Tue, 2024-10-08 at 21:15 -0400, Benjamin Marzinski wrote: > > > > > > if (ct->running) { > > > if (ct->req->state != PATH_PENDING) { > > > @@ -345,31 +343,26 @@ check_state(int fd, struct directio_context > > > *ct, int sync, int timeout_secs) > > > LOG(3, "io_submit error %i", -rc); > > > return PATH_UNCHECKED; > > > } > > > + get_monotonic_time(&ct->endtime); > > > + ct->endtime.tv_nsec += 1000 * 1000; > > > + normalize_timespec(&ct->endtime); > > > } > > > ct->running++; > > > > Sorry, I have another concern here, despite my reviewed-by. This > > very > > old statement, combined with "if (ct->running > c->timeout)" below, > > somehow assumes that "running" counts seconds, which means that > > this > > code must be called once a second. I guess this has been > > approximately > > true in the past, when this code was originally written. It has > > never > > been clean, we should have compared time stamps instead. > > Yeah. That makes sense. > > > > > But now, it seems to be plain wrong, "running" will never have a > > higher > > value than 1 as it is only called once from libcheck_check(), and > > the > > (ct->running - c->timeout) statement is broken. > > > > Am I misreading this code? > > Either you are or I am. I am :-) > libcheck_check() is called by start_checker() > each time a path comes up for checking. libcheck_check() calls > check_state() with the number of seconds that the checker should wait > till it times out. If ct->running is non-zero, but the path is still > pending, check_state() increments ct->running and returns > PATH_PENDING. My misconception was that we call libcheck_check() once, and then call libcheck_pending() repeatedly until the checker either finishes or times out. The isn't true even with your patch set, both functions are called in every checkerloop, just that libcheck_check() doesn't restart the checker instance if it's already running (it just increments running, as you mentioned). (Why I had this "misconception"? I think it's kind of a vague idea that I had in my mind how it should work; it's not in your code, and I'm not asking your to change the code in this way). > libcheck_pending() is called by get_state() every time a path gets > checked and its state is PATH_PENDING. If ct->running is greater than > c->timeout then the checker has timed out (it assumes that ct- > >running is > counting seconds) and gets cancelled. > > Also, I only see (ct->running > c->timeout), not > (ct->running - c->timeout) are you talking about something else in > the > code that I'm missing? No, typo. > At any rate, I can change this to use actual timestamps instead of > our > ct->running assumptions. Yes, please. "counting seconds" this way is wrong and confusing. Regards, Martin PS: I'll be out of office for 2 weeks, so final review of your series will probably be delayed again. Sorry about that.