This is a summary of an issue I've been looking at with a very large centralized Git repository. It's a repository that gets approximately 100 commits per day, almost all to its master branch. I think I've found why the issue I'm describing happens (not confirmed yet), I mainly wanted to write something to the list to have a record of this in case anyone runs into it in the future. Last week we upgraded form Git 1.6.5 to 1.7.2.1 on the server housing our repository, and started getting errors like these from developers running variants of git-fetch: $ git pull --rebase remote: Counting objects: 2, done. remote: Compressing objects: 100% (2/2), done. remote: Total 2 (delta 0), reused 0 (delta 0) remote: aborting due to possible repository corruption on the remote side. error: waitpid for pack-objects failed: No child processes error: git upload-pack: git-pack-objects died with error. fatal: git upload-pack: aborting due to possible repository corruption on the remote side. Unpacking objects: 100% (2/2), done. fatal: error in sideband demultiplexer That error is from https://github.com/git/git/commit/b1c71b72815cb82a8bad14020a047320b88a04eb by Junio from 2006, we're refusing to send an incomplete pack file on failure. We've also been getting this error from git-fetch directly (from a wrapper script): # INFO : Checking working directory # ERROR: failed to git fetch --tags from 'origin' errorcode: 128 # ERROR: git fetch --tags origin # ERROR: error: waitpid for pack-objects failed: No child processes # ERROR: error: git upload-pack: git-pack-objects died with error. # ERROR: fatal: git upload-pack: aborting due to possible repository corruption on the remote side. # ERROR: remote: aborting due to possible repository corruption on the remote side. # ERROR: fatal: error in sideband demultiplexer And from git-remote-update(1): $ git remote update Fetching origin remote: Counting objects: 9, done. remote: Compressing objects: 100% (5/5), done. remote: Total 5 (delta 4), reused 0 (delta 0) error: waitpid for pack-objects failed: No child processes error: git upload-pack: git-pack-objects died with error. fatal: git upload-pack: aborting due to possible repository corruption on the remote side. remote: aborting due to possible repository corruption on the remote side. Unpacking objects: 100% (5/5), done. fatal: error in sideband demultiplexer error: Could not fetch origin All of these except maybe the first one (wasn't able to contact the dev in question) come from Git 1.7.2.1 clients talking to the 1.7.2.1 server. Anyway, I think this issue is caused by this RHEL bug: https://bugzilla.redhat.com/show_bug.cgi?id=166669 ([RHEL3 U5] waitpid() returns unexpected ECHILD) which was fixed in this RHEL update: http://rhn.redhat.com/errata/RHSA-2006-0144.html This is our Git server: $ cat /etc/redhat-release && uname -r CentOS release 4.1 (Final) 2.6.9-11.ELsmp And if I run: wget https://bugzilla.redhat.com/attachment.cgi?id=118759 -O killipf.c && gcc -O2 -o killipf killipf.c -lpthread && PASS=0; while ./killipf; do let PASS=++PASS; echo $PASS; done It'll die within a minute with a message like this: PASS : received expected signal 9 14605 child pid:2563 waitpid failed!: No child processes It does *not* die on these machines: $ cat /etc/redhat-release && uname -r CentOS release 4.6 (Final) 2.6.9-67.0.7.ELsmp $ cat /etc/redhat-release && uname -r CentOS release 5.5 (Final) 2.6.18-194.el5PAE Or on my personal Debian box: $ cat /etc/debian_version && uname -r wheezy/sid 2.6.32-5-amd64 I haven't been able to trigger this issue with Git itself. I tried putting a copy of the repository in /tmp, then one on client running in a while loop: while true; do head -n 10 /dev/urandom >a_file && git commit -m"more crap" a_file && git push done And on another client running: while true; do git pull done And I never got this waitpid error message, I might have just been unlucky though, or perhaps it wasn't triggered in that case for some reason. Given this information we're going to upgrade CentOS on the relevant machine, I'll follow up on the list in a couple of weeks indicating whether or not that worked. We have enough users that if I ask people to tell me if we get this error and I don't hear anything for two weeks I can safely assume it went away. What we might want to do in Git is to work around this broken waitpid behavior (if that's indeed the issue). I haven't dug into what the RHEL kernel patch is solving, so I don't know if we can inexpensively detect this when this is happening and warn users about it. Then again it would be a lot of work to work around a specific kernel bug. What I *mainly* wanted to do was to insert some note of this into the Git mailing list archive. Which I've now done. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html