The last test of 't5561-http-backend.sh', 'server request log matches test results' may fail occasionally, because the order of entries in Apache's access log doesn't match the order of requests sent in the previous tests, although all the right requests are there. I saw it fail on Travis CI five times in the span of about half a year, when the order of two subsequent requests was flipped, and could trigger the failure with a modified Git. However, I was unable to trigger it with stock Git on my machine. Three tests in 't5541-http-push-smart.sh' and 't5551-http-fetch-smart.sh' check requests in the log the same way, so they might be prone to a similar occasional failure as well. When a test sends a HTTP request, it can continue execution after 'git-http-backend' fulfilled that request, but Apache writes the corresponding access log entry only after 'git-http-backend' exited. Some time inevitably passes between fulfilling the request and writing the log entry, and, under unfavourable circumstances, enough time might pass for the subsequent request to be sent and fulfilled, and then Apache writes access log entries racily. This effect can be exacerbated by adding a bit of variable delay after the request is fulfilled but before 'git-http-backend' exits, e.g. like this: diff --git a/http-backend.c b/http-backend.c index f3dc218b2..bbf4c125b 100644 --- a/http-backend.c +++ b/http-backend.c @@ -709,5 +709,7 @@ int cmd_main(int argc, const char **argv) max_request_buffer); cmd->imp(&hdr, cmd_arg); + if (getpid() % 2) + sleep(1); return 0; } This delay considerably increases the chances of log entries being written out of order, and in turn makes t5561's last test fail almost every time. Alas, it doesn't seem to be enough to trigger a similar failure in t5541 and t5551. Now, by default the timestamp of a log entry marks the beginning of the request processing, not when the log entry gets written. Since our requests are sent sequentially, sorting the log entries based on their timestamps would ensure that their order corresponds to the order of our requests. Therefore, change Apache's log format to use microseconds since epoch as timestamps, the finest resolution Apache has to offer. This will: - give enough resolution to make reasonably sure that no two requests have the same timestamp; the default timestamp format has only one second resolution, which is not nearly enough. - make the timestamps sortable; the default timestamp format looks like "[31/Jan/2018:16:33:52 +0000]", which would trip sorting if the test script happens to run over month boundaries. Modify the strip_access_log() helper function, used by all potentially affected tests, to sort the access log entries based on their timestamp before stripping the uninteresting bits from the entries. Unfortunately, this "microseconds since epoch" is an extended time format specifier which was only introduced in Apache 2.2.30 [1], and standard strftime() doesn't have any format specifiers with microsecond or finer precision, so we can't use the above solution with Apache versions older than 2.2.30. However, since Apache 2.2 is no longer maintained anyway, it's not that big a deal to leave test runs with those old versions prone to this occasional failure. So use microsecond timestamps and sort access log entries only if the test is run with an Apache version supporting it; otherwise stick to the default timestamp format and leave the order of access log entries unchanged. [1] http://httpd.apache.org/docs/2.2/mod/mod_log_config.html Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> --- Notes: The log of the latest of this test failures: https://travis-ci.org/szeder/git-cooking-topics-for-travis-ci/jobs/391499735#L3130 I don't really like the fix in this patch. I think an unfortunate clock skew during the test run could mess up the sorting added in this patch and cause test failure. Or if DST or even a leap second happen while the test is running. Do we care? Anyway, this occasional test failure apparently happens more often than DST and leap seconds combined. Furthermore, in the future we might have computers fast enough to fulfill more than one test requests in a single microsecond; but I guess by that time we'll have higher resolution time formats as well. An alternative I considered was that we could decide that the order of requests in the access log is not important as long as all the right requests are there. This would inherently eliminate the raciness issue, but if something were to go wrong, then it would become rather hard to find out e.g. which request from which test has gone missing, especially considering that several requests are sent in more than one test. We could address this by truncating the access log at the beginning and checking its contents at the end of each test sending requests. Unfortunately, this would raise additional difficulties, because all requests in t5561 come from tests included from 't556x-common', i.e. from tests shared with 't5560-http-backend-noserver.sh', which as its name suggests doesn't run Apache and doesn't have an access log at all. Another alternative is to simply accept that these tests are racy, and that the resulting test failures are rare enough that it isn't worth the complexity of the workaround, but adding a comment to the affected tests warning about the raciness is sufficient. (But I wrote this when I first saw and tracked down this failure; since then I observed it four more times... :) -- Apache doesn't maintain 2.2 anymore; the final maintenance release 2.2.34 was released in July 2017, almost a year ago. OTOH, our 't/lib-httpd/apache.conf' contains a couple of IfVersion directives dealing with versions <2.4, and one even with <2.1. How much do we actually care about these unmaintained Apache versions, and how much of this is just bitrotting? t/lib-httpd.sh | 13 ++++++++++++- t/lib-httpd/apache.conf | 5 +++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh index 5915625631..1a7fe67141 100644 --- a/t/lib-httpd.sh +++ b/t/lib-httpd.sh @@ -289,10 +289,21 @@ expect_askpass() { } strip_access_log() { + case "$HTTPD_VERSION" in + 2.[01].*|2.2.[0-9]|2.2.[12][0-9]) + # Apache versions < 2.2.30 didn't have any log timestamp + # format with microseconds precision, so don't attempt to + # sort the access log when testing with such old versions. + cat "$HTTPD_ROOT_PATH"/access.log + ;; + *) + sort -n -k 4,4 "$HTTPD_ROOT_PATH"/access.log + ;; + esac | \ sed -e " s/^.* \"// s/\"// s/ [1-9][0-9]*\$// s/^GET /GET / - " "$HTTPD_ROOT_PATH"/access.log + " } diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf index 724d9ae462..8d6da6bdb2 100644 --- a/t/lib-httpd/apache.conf +++ b/t/lib-httpd/apache.conf @@ -1,7 +1,12 @@ ServerName dummy PidFile httpd.pid DocumentRoot www +<IfVersion >= 2.2.30> +LogFormat "%h %l %u %{usec}t \"%r\" %>s %b" common +</IfVersion> +<IfVersion < 2.2.30> LogFormat "%h %l %u %t \"%r\" %>s %b" common +</IfVersion> CustomLog access.log common ErrorLog error.log <IfModule !mod_log_config.c> -- 2.18.0.rc0.207.ga6211da864