Earlier this year, John Cai sent 2 versions of a patch series to implement `git repack --filter=<filter-spec>`: https://lore.kernel.org/git/pull.1206.git.git.1643248180.gitgitgadget@xxxxxxxxx/ We tried to "sell" it as a way to use partial clone on a Git server to offload large blobs to, for example, an http server, while using multiple promisor remotes on the client side. Even though it is still our end goal, it seems a bit far fetched for now and unnecessary as `git repack --filter=<filter-spec>` could be useful on the client side too. For example one might want to clone with a filter to avoid too many space to be taken by some large blobs, and one might realize after some time that a number of the large blobs have still be downloaded because some old branches referencing them were checked out. In this case a filtering repack could remove some of those large blobs. Some of the comments on the patch series that John sent were related to the possible data loss and repo corruption that a filtering repack could cause. It's indeed true that it could be very dangerous, so the first version of this patch series asked the user to confirm the command, either by answering 'Y' on the command line or by passing `--force`. In the discussion with Junio following that first version though, it appeared that asking for such confirmation might not be necessary, so the v2 removed those checks. Taylor though asked what would happen to the 'remote.<name>.promisor' and 'remote.<name>.partialclonefilter' config variables when a filtering repack is run. As it seemed to me that we should just check that a promisor remote has been configured and fail if that's not the case, that was implemented in the third version of this patch series. In the discussions following the first, second and third versions, Junio commented that `git gc` was a better way for users to launch filtering repacks then `git repack`, so in this v4 a new 'gc.repackFilter' config option is implemented that allows `git gc` to perform filtering repacks. When this config option is set to a non empty string, `git gc` will just add a `--filter=<filter-spec>` argument to the repack processes it launches, with '<filter-spec>' set to the value of 'gc.repackFilter'. So the changes in this v4 compared to v3 are the following: - rebased on top of 57e2c6ebbe (Start the 2.40 cycle, 2022-12-14) to avoid a simple conflict, - simplified the test in patch 2/3 by using `grep -c ...` instead of `grep ... | wc -l`, - added patch 3/3 which implements a new 'gc.repackFilter' config option so that `git gc` can perform filtering repacks. Thanks to Junio and Taylor for discussing the v1, v2 and v3, to John Cai, who worked on the previous versions, to Jonathan Nieder, Jonathan Tan and Taylor, who discussed this with me at the Git Merge and Contributor Summit, and to Stolee, Taylor, Robert Coup and Junio who discussed the versions John sent. Range diff with v3: 1: 1e64cac782 < -: ---------- pack-objects: allow --filter without --stdout -: ---------- > 1: c2dca82dee pack-objects: allow --filter without --stdout 2: 7216a7bc05 ! 2: 1dcdba4b1d repack: add --filter=<filter-spec> option @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix + write_promisor_file_1(line.buf); item->util = populate_pack_exts(item->string); } - fclose(out); + strbuf_release(&line); ## t/t7700-repack.sh ## @@ t/t7700-repack.sh: test_expect_success 'auto-bitmaps do not complain if unavailable' ' @@ t/t7700-repack.sh: test_expect_success 'auto-bitmaps do not complain if unavaila + git clone --bare --no-local server client && + git -C client config remote.origin.promisor true && + git -C client rev-list --objects --all --missing=print >objects && -+ test $(grep "^?" objects | wc -l) = 0 && ++ test $(grep -c "^?" objects) = 0 && + git -C client -c repack.writebitmaps=false repack -a -d --filter=blob:none && + git -C client rev-list --objects --all --missing=print >objects && -+ test $(grep "^?" objects | wc -l) = 1 ++ test $(grep -c "^?" objects) = 1 +' + objdir=.git/objects -: ---------- > 3: 6bb98b4b00 gc: add gc.repackFilter config option Christian Couder (3): pack-objects: allow --filter without --stdout repack: add --filter=<filter-spec> option gc: add gc.repackFilter config option Documentation/config/gc.txt | 9 +++++++++ Documentation/git-repack.txt | 8 ++++++++ builtin/gc.c | 6 ++++++ builtin/pack-objects.c | 8 ++------ builtin/repack.c | 28 +++++++++++++++++++++------- t/t6500-gc.sh | 19 +++++++++++++++++++ t/t7700-repack.sh | 15 +++++++++++++++ 7 files changed, 80 insertions(+), 13 deletions(-) -- 2.39.0.59.g395bcb85bc.dirty