On ma, 2014-10-20 at 16:29 +0200, Dennis Kaarsemaker wrote: > Since a few days, one of our repos is causing problems during git fetch, > basically git fetch over http hangs during find_common. When using ssh, > this does not happen. <snip things that may not be relevant>. > And for the hanging git-upload-pack: > #0 0x00007f7c8034b4d0 in __write_nocancel () from /lib64/libpthread.so.0 > #1 0x000000000043c9dc in xwrite (fd=1, buf=0x6c70e0, len=56) at wrapper.c:170 > #2 0x000000000043ca5b in write_in_full (fd=1, buf=<value optimized out>, count=56) at wrapper.c:220 > #3 0x000000000043d019 in write_or_die (fd=<value optimized out>, buf=<value optimized out>, > count=<value optimized out>) at write_or_die.c:61 > #4 0x00000000004131fa in packet_write (fd=1, fmt=<value optimized out>) at pkt-line.c:93 > #5 0x000000000040538c in get_common_commits (argc=<value optimized out>, argv=0x7fff00000001) at upload-pack.c:420 > #6 upload_pack (argc=<value optimized out>, argv=0x7fff00000001) at upload-pack.c:778 > #7 main (argc=<value optimized out>, argv=0x7fff00000001) at upload-pack.c:846 > > And the hanging git-http-backend: > #0 0x00007f4c9553d4d0 in __write_nocancel () from /lib64/libpthread.so.0 > #1 0x000000000042d31c in xwrite (fd=4, buf=0x7fff394ea570, len=8192) at wrapper.c:170 > #2 0x000000000042d39b in write_in_full (fd=4, buf=<value optimized out>, count=8192) at wrapper.c:220 > #3 0x0000000000403e5d in inflate_request (prog_name=0x490d98 "upload-pack", out=4) at http-backend.c:305 > #4 0x000000000040405d in run_service (argv=0x7fff394ee6d0) at http-backend.c:355 > #5 0x00000000004041d2 in service_rpc (service_name=<value optimized out>) at http-backend.c:508 > #6 0x0000000000404b35 in main (argc=<value optimized out>, argv=<value optimized out>) at http-backend.c:631 > > Both git-http-backend and git-upload-pack are trying to write at the > same time. I'm *guessing* I've hit some buffer limit here, given that > the have/ack exchanges are increasing in size and suddenly this one is > misbehaving. However I have no idea where to look next and would really > appreciate some help. I think the reasoning in 44d8dc54e73e8010c4bdf57a422fc8d5ce709029 is incomplete: there's still a pipe involved in the case of gzip'ed request bodies, and it's here that it hangs. However, I now do think that this is merely a symptom, because after inspecting the output a bit further I noticed that all reponses start with the same lines: got ack 3 a669f13aab3a2c192c15574ead70f92b303e8aee got ack 3 360530ff695a4deb01575e85976060a083e17245 got ack 3 bab20d62a5a4c34885cf2acbf83aca91908f9af8 In fact, response N, is the same as response N-1 plus acks for the commits in the 'have' lines of the debug output for the next request. So it looks like every request sends all common commits again, which seems wrong but does explain the ever-growing request size. After commenting out line 413 in fetch-pack.c (state_len = req_buf.len) the requests and responses don't increase in size, and fetch completes, though the received pack seems too large (http response is 400MB), which makes me think it's not actually ack'ing. Subsequent HTTP fetches don't get a big pack in response though, so maybe the pack is the right size. THis is a *very* busy repo with thousands of commits between the last succesful fetch 5 days ago and the first succesfil fetch after my hack. In any case, I think there's a bug here, but I don't know nearly enough about the protocol to judge if my "fix" is even close to correct. I've also not tested my "fix" with any other protocol yet. -- Dennis Kaarsemaker http://www.kaarsemaker.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html