Re: [RFC PATCH 4/4] t/lib-httpd: sort log based on timestamp to avoid occasional failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 14, 2018 at 7:53 PM Jeff King <peff@xxxxxxxx> wrote:
>
> On Thu, Jun 14, 2018 at 02:31:07PM +0200, SZEDER Gábor wrote:
>
> > The last test of 't5561-http-backend.sh', 'server request log matches
> > test results' may fail occasionally, because the order of entries in
> > Apache's access log doesn't match the order of requests sent in the
> > previous tests, although all the right requests are there.

> I've occasionally run into these failures on my local box, too. I'm
> happy somebody is looking into it (I have before, but eventually threw
> up my hands in disgust).

Heh, well, I found it to be one of the more entertaining debugging
sessions... ;)
I'm glad that somebody else saw it, too, and outside of Travis CI.

> > Now, by default the timestamp of a log entry marks the beginning of
> > the request processing, not when the log entry gets written.  Since
> > our requests are sent sequentially, sorting the log entries based on
> > their timestamps would ensure that their order corresponds to the
> > order of our requests.
>
> That's a reasonably clever solution. One thing I wonder, though: are we
> always guaranteed that the log entries are written _at all_ before we
> look at them?
>
> I.e., could we have a situation where we make a request, the client
> finishes, and then we look at the logs, but nothing has been written by
> apache yet?

That's possible, I suppose.  Highly improbable, but possible.  Or the
log entries can be written only partially by the time we look at them.

FWIW, Apache has the 'BufferedLogs' directive, which "causes
mod_log_config to store several log entries in memory and write them
together to disk, rather than writing them after each request."  It
very sensibly defaults to off, and our test config doesn't set it.

I would think that stopping Apache should flush the logs, but I don't
know whether 'stop_httpd' blocks until the web server is indeed
stopped, or it's all async.  Perhaps a SIGHUP would suffice to flush
the logs, but that's definitely async.  Also note that while t5561
checks the access log at the end of the script, t5541 and t5551 do
check the logs in the middle of the script, so we would have to stop
Apache and then start again.

Anyway, I would prefer not to do anything about it just yet, because
I'm curious :) and I'd like to see whether this actually happens in
practice.


> >     I don't really like the fix in this patch.  I think an unfortunate
> >     clock skew during the test run could mess up the sorting added in this
> >     patch and cause test failure.  Or if DST or even a leap second happen
> >     while the test is running.  Do we care?  Anyway, this occasional test
> >     failure apparently happens more often than DST and leap seconds
> >     combined.
>
> We could probably eliminate DST issues by consistently using UTC for the
> timestamps. Leap seconds are probably infrequent enough not to worry
> about. More likely is something like clock adjustment due to ntp. Those
> adjustments are usually small enough not to matter, but if we're talking
> microseconds, it could trigger.

Ok, so admittedly I haven't quite thought through all these potential
clock-related issues that I mentioned.

Setting back the clock at the end of DST is not an issue, because
Apache uses the timezone it finds set in the environment, and our test
environment does set UTC, so as far as the tests and Apache's log
timestamps are concerned there is no such thing as DST.
Leap seconds don't go back in time, they are only added to compensate
for Earth's slowing rotation, so that's not an issue, either.
And I think NTP is supposed to be clever enough to not set back the
clock recklessly, but only advance it slower.

Of course, someone can still set back the clock manually while these
tests run, but I would put that in the "If it hurts, don't do it" box.


However, having said all this, ...

> What if the test script provides the in-order expectation, but we check
> only the unordered version (by sorting both actual and expected output
> on the fly)? That would give us a more relaxed check most of the time,
> but somebody digging in to a failure could run the ordered diff (or we
> could even show it automatically on failure instead of just using
> test_cmp).

this is the fix I'll go for in v2.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux