When specifying --attic=<prefix>, the objects that would be lost when calling repack with -d will be put into a packfile (or multiple packfiles), using the file name prefix <prefix>. Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> --- This implements the idea of Hannes. The plan for repo.or.cz is now to invoke repack with "--attic=attic" and copied attic-*.{idx,pack} to all the forks' object stores, then delete the original attic-*{.idx,pack}. The beauty of that approach is that the order in which the repositories are repacked is no longer important. This patch is marked RFC since there is a severe bottleneck here: the new pack's index is sorted and made unique and every SHA-1 displayed twice, then the old pack's index is sorted and made unique. Then the combined result is sorted and only the now-unique SHA-1s are actually packed. (The sort is not necessary if there is only _one_ pack. However, we cannot guarantee that.) Of course, this is quick 'n dirty, and the price to be paid is a substantial performance hit: in my tests, linux-2.6.git needed half a second to show its pack's index, but that sed 's/^.* //' | sort | uniq | sed p mantra needs 19 seconds. The obvious thing is to exploit the fact that the pack indices are already sorted: I started patch git-show-index so it takes an argument --missing-objects, followed by the new pack index file names, followed by --, followed be the old pack index file names. Then it would traverse all of them simultaneously, outputting only the SHA-1s of objects that are in an old pack, but not in any of the new packs. Two issues: there might be a whole lot of pack files (Pasky told me today that in one instance there were 416 pack files!) and that might well exceed the maximum number of open files. Second issue: there are two different pack index formats, and the code is not easily refactored AFAICT. Probably a better method would be not to read the files simultaneously, but fill a "struct decorate *" with objects (which can be faked, as we do not really need to parse them) of the new packs, and then only use decorate_lookup() to determine for all old packs' objects if they are present in the new ones. The latter approach would allow for a relatively easy refactoring of show-index.c; just provide a callback for each entry. However, I am way too tired today to do it. Besides, a completely different idea just struck me: before repacking, .git/objects/pack/* could be _hard linked_ to the forkee's object stores. Then nothing in git-repack's code needs to be changed. Oh, well. I just wasted 1.5 hours. Documentation/git-repack.txt | 4 +++ git-repack.sh | 40 +++++++++++++++++++++++++++++++++---- t/t5303-repack-attic.sh | 44 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 83 insertions(+), 5 deletions(-) create mode 100644 t/t5303-repack-attic.sh diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 12e2079..ec2c2bf 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -84,6 +84,10 @@ OPTIONS If specified, multiple packfiles may be created. The default is unlimited. +--attic=<prefix>:: + Put all objects that would/will be lost when running with `-d` + into its own packfile(s), with file name prefix `<prefix>`. + Configuration ------------- diff --git a/git-repack.sh b/git-repack.sh index e18eb3f..f83d6f0 100755 --- a/git-repack.sh +++ b/git-repack.sh @@ -18,12 +18,13 @@ window= size of the window used for delta compression window-memory= same as the above, but limit memory size instead of entries count depth= limits the maximum delta depth max-pack-size= maximum size of each packfile +attic= pack no-longer-packed used objects into an "attic" pack " SUBDIRECTORY_OK='Yes' . git-sh-setup no_update_info= all_into_one= remove_redundant= keep_unreachable= -local= quiet= no_reuse= extra= +local= quiet= no_reuse= extra= attic= while test $# != 0 do case "$1" in @@ -37,6 +38,8 @@ do -l) local=--local ;; --max-pack-size|--window|--window-memory|--depth) extra="$extra $1=$2"; shift ;; + --attic) + attic="$2"; shift ;; --) shift; break;; *) usage ;; esac @@ -119,6 +122,36 @@ for name in $names ; do rm -f "$PACKDIR/old-pack-$name.pack" "$PACKDIR/old-pack-$name.idx" done +new_existing= +for e in $existing +do + case "$ fullbases " in + *" $e "*) ;; + *) + new_existing="$new_existing $e" + ;; + esac +done +existing="$new_existing" + +if test ! -z "$attic" +then + # Find the objects which were in the existing packs, but are no + # longer in the new ones. + # + # This could be much more efficient. + (for name in $names + do + git show-index < $PACKDIR/pack-$name.idx + done | sed 's/^.* //' | sort | uniq | sed p && + for e in $existing + do + git show-index < "$PACKDIR/$e.idx" + done | sed 's/^.* //' | sort | uniq) | sort | uniq -u | + git pack-objects --non-empty "$attic" || + die "Could not create attic '$attic'." +fi + if test "$remove_redundant" = t then # We know $existing are all redundant. @@ -128,10 +161,7 @@ then ( cd "$PACKDIR" && for e in $existing do - case " $fullbases " in - *" $e "*) ;; - *) rm -f "$e.pack" "$e.idx" "$e.keep" ;; - esac + rm -f "$e.pack" "$e.idx" "$e.keep" done ) fi diff --git a/t/t5303-repack-attic.sh b/t/t5303-repack-attic.sh new file mode 100644 index 0000000..9777748 --- /dev/null +++ b/t/t5303-repack-attic.sh @@ -0,0 +1,44 @@ +#!/bin/sh +# +# Copyright (c) 2007 Johannes E. Schindelin +# + +test_description='repack with an attic pack' +. ./test-lib.sh + +test_expect_success 'setup' ' + + echo "Ten weary, footsore wanderers," > file && + git add file && + test_tick && + git commit -m initial && + echo "all in a woeful plight," >> file && + test_tick && + git commit -m second file && + echo "sought shelter in a wayside-inn" >> file && + test_tick && + git commit -m third file && + echo "one dark and lonely night." >> file && + test_tick && + git commit -m fourth file && + echo ">>Nine rooms, no more<<, the landlord said," >> file && + test_tick && git commit -m fifth file && + git repack -a -d && + ! ls .git/objects/??/* + +' + +test_expect_success 'create attic pack' ' + + LAST_VERSION=$(git rev-parse --verify HEAD:file) && + git reset --hard HEAD^ && + rm .git/logs/HEAD .git/logs/refs/heads/master && + git cat-file blob $LAST_VERSION && + git repack --attic=attic -a -d && + ! git cat-file blob $LAST_VERSION && + test -f attic-*.idx && + cat attic-*.idx | git show-index | grep $LAST_VERSION + +' + +test_done -- 1.5.3.6.2066.g09421 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html