Re: [PATCH v4 04/22] libmultipath: remove pending wait code from libcheck_check calls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2024-10-10 at 15:45 -0400, Benjamin Marzinski wrote:
> On Wed, Oct 09, 2024 at 05:00:55PM +0200, Martin Wilck wrote:
> > On Tue, 2024-10-08 at 21:15 -0400, Benjamin Marzinski wrote:
> > >  
> > >  	if (ct->running) {
> > >  		if (ct->req->state != PATH_PENDING) {
> > > @@ -345,31 +343,26 @@ check_state(int fd, struct directio_context
> > > *ct, int sync, int timeout_secs)
> > >  			LOG(3, "io_submit error %i", -rc);
> > >  			return PATH_UNCHECKED;
> > >  		}
> > > +		get_monotonic_time(&ct->endtime);
> > > +		ct->endtime.tv_nsec += 1000 * 1000;
> > > +		normalize_timespec(&ct->endtime);
> > >  	}
> > >  	ct->running++;
> > 
> > Sorry, I have another concern here, despite my reviewed-by. This
> > very
> > old statement, combined with "if (ct->running > c->timeout)" below,
> > somehow assumes that "running" counts seconds, which means that
> > this
> > code must be called once a second. I guess this has been
> > approximately
> > true in the past, when this code was originally written. It has
> > never
> > been clean, we should have compared time stamps instead. 
> 
> Yeah. That makes sense.
> 
> > 
> > But now, it seems to be plain wrong, "running" will never have a
> > higher
> > value than 1 as it is only called once from libcheck_check(), and
> > the
> > (ct->running - c->timeout) statement is broken.
> > 
> > Am I misreading this code?
> 
> Either you are or I am. 

I am :-)

> libcheck_check() is called by start_checker()
> each time a path comes up for checking. libcheck_check() calls
> check_state() with the number of seconds that the checker should wait
> till it times out. If ct->running is non-zero, but the path is still
> pending, check_state() increments ct->running and returns
> PATH_PENDING.

My misconception was that we call libcheck_check() once, and then call
libcheck_pending() repeatedly until the checker either finishes or
times out. The isn't true even with your patch set, both functions are
called in every checkerloop, just that libcheck_check() doesn't restart
the checker instance if it's already running (it just increments
running, as you mentioned).

(Why I had this "misconception"? I think it's kind of a vague idea that
I had in my mind how it should work; it's not in your code, and I'm not
asking your to change the code in this way).

> libcheck_pending() is called by get_state() every time a path gets
> checked and its state is PATH_PENDING. If ct->running is greater than
> c->timeout then the checker has timed out (it assumes that ct-
> >running is
> counting seconds) and gets cancelled.

> 
> Also, I only see (ct->running > c->timeout), not
> (ct->running - c->timeout) are you talking about something else in
> the
> code that I'm missing?

No, typo.

> At any rate, I can change this to use actual timestamps instead of
> our
> ct->running assumptions.

Yes, please. "counting seconds" this way is wrong and confusing.

Regards,
Martin

PS: I'll be out of office for 2 weeks, so final review of your series
will probably be delayed again. Sorry about that.






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux