On Fri, Jan 29, 2010 at 8:14 PM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > On Fri, 29 Jan 2010, Jon Nelson wrote: ... >> 1. I see the 'git-repack' shell process scanning for .keep files. I >> don't have any. Is there a shortcut to this? >> >> It's also hugely inefficient. In this case, the code to identify non >> .keep packs takes *4 minutes, 45 seconds*, lots of disk I/O, and lots >> of CPU (it pegs one CPU at 100% for the entire duration). With a wee >> bit of awk, I have reduced that to 2.3 seconds with VASTLY reduced I/O >> and CPU requirements. Patch attached. > > Your patch will pick any .pack file in the repo not only from the > .git/objects/pack directory. There is no such thing as *.pack.keep > either. Ugh. Yep. Patch amended. Still fast. Still wrong? >> 3. When git pack objects is running and counting up the number of >> objects, it is stat'ing files that aren't in the working directly, and >> should not be, according to the index. If I switch the repo to be a >> "bare" repository, then it doesn't do that, however, why is it doing >> that in the first place? > > A bare repository has no index. When the index is present though, it is > necessary to also pack objects it references. Why working directory > files would be stat()'d in that case I don't know. Inquiring minds want to know. >> 4. Should git-pack-objects be reading the pack.idx files for counting >> objects instead of the .pack files themselves? > > No. The whole point when "counting objects" is to perform a walk of the > history graph and capture the set of objects that are actually > referenced from your branches/tags and leave the unreferenced objects > behind. Also the order in which those objects are encountered during > that history walk is very important for efficient object placement in > the final pack. So this is much more involved than only listing the > objects contained in every packs. Ah. For some reason I thought the .idx files contained not just a straight listing but also the parent/child relationships as well. > You could try: > > git config core.packedGitLimit 256m > git config core.packedGitWindowSize 32m > git config pack.deltaCacheSize 1 > > and try repacking again with 'git gc --prune=now'. After the repack > succeeds, you should be able to remove the above configs from your > .git/config file. I have since thrown out the repo and started over on this particular experiment, issuing a 'git gc' rather more often. The config options above are now dutifully scribbled down. Thanks! diff --git a/git-repack.sh b/git-repack.sh index 1eb3bca..3cef57d 100755 --- a/git-repack.sh +++ b/git-repack.sh @@ -62,15 +62,7 @@ case ",$all_into_one," in ,t,) args= existing= if [ -d "$PACKDIR" ]; then - for e in `cd "$PACKDIR" && find . -type f -name '*.pack' \ - | sed -e 's/^\.\///' -e 's/\.pack$//'` - do - if [ -e "$PACKDIR/$e.keep" ]; then - : keep - else - existing="$existing $e" - fi - done + existing=$( cd "$PACKDIR" && find . -type f -name '*.pack' -o -name '*.keep' | sed -e 's/^\.\///' | sort | awk '{ if ($0 ~ /\.keep$/) { N=substr($0, 0, length($0)-4) "pack"; K[N]=0; } else { if ($0 in K) { } else { K[$0]=1; } } } END { for (k in K) { if (K[k] == 1) { printf "%s ", k; } } } ' ) if test -n "$existing" -a -n "$unpack_unreachable" -a \ -n "$remove_redundant" then -- Jon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html