Re: git 2.0.0 PROFILE=BUILD check-phase problems with ./t5561-http-backend.sh; GIT_TEST_HTTPD=false problems with t5537-fetch-shallow.sh

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30 May 2014, Jeff King told this:

> On Thu, May 29, 2014 at 11:44:37PM +0100, Nix wrote:
>
>> I observe test failures with git 2.0.0 which are attributable to the
>> change to run network tests by default. I'm lumping them both together
>> into one report because I'm lazy and I've blown too much time on this
>> already.
>
> Weird. I also see a strange failure on t5310 when building with
> PROFILE=BUILD. We get a segfault when reading jgit-produced bitmaps.

I don't see that, but this may be a compiler difference, or
environmental difference, or something.

> Tracking it down, we're getting inexplicably bogus data from an mmap'd
> file (!). Compiling without PROFILE=BUILD, the test passes fine (even
> with valgrind).
>
> If I instrument it like this:
>
> diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
> index f7f700e..8cafacf 100644
> --- a/ewah/ewah_io.c
> +++ b/ewah/ewah_io.c
> @@ -119,6 +119,7 @@ int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len)
>  	ptr += sizeof(uint32_t);
>  
>  	self->buffer_size = self->alloc_size = get_be32(ptr);
> +	warning("got buffer_size of %lu from %lu", self->buffer_size, *(uint32_t *)ptr);
>  	ptr += sizeof(uint32_t);
>  
>  	self->buffer = ewah_realloc(self->buffer,
>
> a regular test-run reads:
>
>   warning: got buffer_size of 2 from 33554432
>   warning: got buffer_size of 2 from 33554432
>   warning: got buffer_size of 3 from 50331648
>   warning: got buffer_size of 1 from 16777216
>   warning: got buffer_size of 2 from 33554432
>
> and a PROFILE=BUILD one reads:
>
>   warning: got buffer_size of 2 from 33554432
>   warning: got buffer_size of 2 from 33554432
>   warning: got buffer_size of 3 from 50331648
>   warning: got buffer_size of 1 from 16777216
>   warning: got buffer_size of 131072 from 512
>
> I'm willing to believe we're doing something weird that violates the C
> standard there, but I really can't see it.

It's doubly strange because you're not building with -flto at all, all
callers of ewah_read_mmap() are in a separate file, and there are no
conditionals or loops above the warning() locus. Maybe the error is in a
different file, and is causing a nonsense 'len' to get passed in?

But I think this *may* be a false alarm. *(uint32_t *)ptr on a uint8_t
(in your logging line) is an aliasing violation, and the nearly-
uninitialized '512' is pretty much what I'd expect GCC to do now and
then, given that. Just a guess.

(I'm surprised this doesn't cause a problem *always*, though.)

>> It appears that the Apache daemon is writing to the log slowly enough
>> that its log lines only get there after the testsuite has written its
>> separator, so a bunch of log lines appear to be attached to the wrong
>> test, and the comparison fails.
>
> Yeah, that looks to me like what is happening, too. If I put a 'sleep
> 1' into log_div, it passes. I would think apache would write the log
> before serving the file, but perhaps not.

It does a straight write() via APR, which should work, I'd have thought.
This is on a tmpfs, not a network filesystem or anything.

(Aside: the flushing semantics are quite tangled, but shouldn't matter,
given that we're just doing a write(). Apache calls apr_file_flush() on
the logfile, but that's just flushing apr's analogue of stdio buffers:
it does nothing if the file is unbuffered -- and the error log file is
unbuffered. It does not call apr_file_sync(), but you shouldn't need a
sync() for one process to see another process's output.)

>                                           And like you, I would expect
> gcov to make things slower, not faster. Could there be something in the
> environment that

I'm more inclined to blame the kernel or the compiler, but I *always*
blame the kernel or the compiler. :P

> I'm not sure what the best fix is. We could check the logfiles after
> each test instead of at the end, but that will just end up with the same
> race: we may check them before apache has written them.

Yeah, that seems less than useful.

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]