Earlier this year, John Cai sent 2 versions of a patch series to implement `git repack --filter=<filter-spec>`: https://lore.kernel.org/git/pull.1206.git.git.1643248180.gitgitgadget@xxxxxxxxx/ We tried to "sell" it as a way to use partial clone on a Git server to offload large blobs to, for example, an http server, while using multiple promisor remotes on the client side. Even though it is still our end goal, it seems a bit far fetched for now and unnecessary as `git repack --filter=<filter-spec>` could be useful on the client side too. For example one might want to clone with a filter to avoid too many space to be taken by some large blobs, and one might realize after some time that a number of the large blobs have still be downloaded because some old branches referencing them were checked out. In this case a filtering repack could remove some of those large blobs. Some of the comments on the patch series that John sent were related to the possible data loss and repo corruption that a filtering repack could cause. It's indeed true that it could be very dangerous, so the first version of this patch series asked the user to confirm the command, either by answering 'Y' on the command line or by passing `--force`. In the discussion with Junio following that first version though, it appeared that asking for such confirmation might not be necessary, so the differences in this v2 compared to the v1 are: - the patch 3/3 that introduced the `--force` option has been removed, - the test that checked that the command couldn't be launched without a terminal and without --force has been removed, - the code that checked if the command was launched from a terminal and that asked for confirmation on the command line has been removed, - the documentation of --filter has been improved to make it clearer that objects created locally which haven't been pushed could be deleted. So there are only 2 patches now in this v2 series: - Patch 1/2 is a preparatory patch. - Patch 2/2 introduces the `--filter=<filter-spec>` option. In the discussion with Junio following the first version, Junio suggested adding `--filter=<filter-spec>` to `git gc` and I am still Ok with doing it, either later in a followup patch or in a v3. I haven't done it yet, as I think it's better for performance reasons if anyway the filtering is done underneath by `git repack` (even if for example the --filter option is undocumented in `git repack`), and I'd like the approach in this v2 series to be reviewed first. Thanks to Junio for discussing the v1, to John Cai, who worked on the previous versions, to Jonathan Nieder, Jonathan Tan and Taylor Blau, who discussed this with me at the Git Merge and Contributor Summit, and to Stolee, Taylor, Robert Coup and Junio who discussed the versions John sent. Not sure it's very useful as it's quite big, but here is the range-diff compared to v1: 1: 0ac7ebf48c = 1: d1c65ff1f5 pack-objects: allow --filter without --stdout 2: 144356a97e ! 2: ac21b4ec8f repack: add --filter=<filter-spec> option @@ Commit message This command could be dangerous to use though, as it might remove local objects that haven't been pushed which would lose data and - corrupt the repo. On a server this command could also corrupt a + corrupt the repo. On a server, this command could also corrupt a repo unless ALL the removed objects aren't already available in another remote that clients can access. - To avoid as much as possible data to be lost and repos to be - corrupted, let's warn users and ask them to confirm that they - really want to use this command. - - If no terminal is used, let's just die() for now. A follow-up - commit will introduce a --force option that will allow using - this option when no terminal is available. - - It will be easier to test that --filter is working correctly - in the follow-up commit that adds --force, so let's just test - that we die() if no terminal is used for now. - Signed-off-by: John Cai <johncai86@xxxxxxxxx> Signed-off-by: Christian Couder <chriscool@xxxxxxxxxxxxx> @@ Documentation/git-repack.txt: depth is 4095. +--filter=<filter-spec>:: + Omits certain objects (usually blobs) from the resulting + packfile. WARNING: this could easily corrupt the current repo -+ and lose data if ANY of the omitted objects hasn't been -+ already pushed to a remote. See linkgit:git-rev-list[1] for -+ valid `<filter-spec>` forms. ++ and lose data if ANY of the omitted objects hasn't been already ++ pushed to a remote. Be very careful about objects that might ++ have been created locally! See linkgit:git-rev-list[1] for valid ++ `<filter-spec>` forms. + -b:: --write-bitmap-index:: Write a reachability bitmap index as part of the repack. This ## builtin/repack.c ## -@@ - #include "pack.h" - #include "pack-bitmap.h" - #include "refs.h" -+#include "prompt.h" - - #define ALL_INTO_ONE 1 - #define LOOSEN_UNREACHABLE 2 @@ builtin/repack.c: struct pack_objects_args { const char *depth; const char *threads; @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects, N_("repack objects in packs marked with .keep")), OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"), -@@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix) - die(_("options '%s' and '%s' cannot be used together"), "--cruft", "-k"); - } - -+ if (po_args.filter) { -+ const char *yesno; -+ -+ if (!isatty(STDIN_FILENO)) -+ die (_("Repacking with a filter is not allowed " -+ "yet unless a terminal is used!")); -+ -+ /* -+ * TRANSLATORS: Make sure to include [y] and [N] in your translation. -+ * The program will only accept English input at this point. -+ */ -+ yesno = git_prompt(_("Repacking with a filter will lose data and corrupt the repo\n" -+ "if ANY of the filtered out object hasn't been already pushed!\n" -+ "Repack with a filter anyway [y/N]? "), PROMPT_ECHO); -+ if (tolower(*yesno) != 'y') -+ exit(1); -+ } -+ - if (write_bitmaps < 0) { - if (!write_midx && - (!(pack_everything & ALL_INTO_ONE) || !is_bare_repository())) @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix) if (line.len != the_hash_algo->hexsz) die(_("repack: Expecting full hex object ID lines only from pack-objects.")); @@ t/t7700-repack.sh: test_expect_success 'auto-bitmaps do not complain if unavaila test_must_be_empty actual ' -+test_expect_success 'repacking with a filter is not allowed' ' -+ test_must_fail git repack -a -d --filter=blob:none 2>actual && -+ test_i18ngrep "Repacking with a filter is not allowed" actual ++test_expect_success 'repacking with a filter works' ' ++ test_when_finished "rm -rf server client" && ++ test_create_repo server && ++ git -C server config uploadpack.allowFilter true && ++ git -C server config uploadpack.allowAnySHA1InWant true && ++ test_commit -C server 1 && ++ git clone --bare --no-local server client && ++ git -C client config remote.origin.promisor true && ++ git -C client rev-list --objects --all --missing=print >objects && ++ test $(grep "^?" objects | wc -l) = 0 && ++ git -C client -c repack.writebitmaps=false repack -a -d --filter=blob:none && ++ git -C client rev-list --objects --all --missing=print >objects && ++ test $(grep "^?" objects | wc -l) = 1 +' + objdir=.git/objects 3: a23e19796e < -: ---------- repack: introduce --force to force filtering Christian Couder (2): pack-objects: allow --filter without --stdout repack: add --filter=<filter-spec> option Documentation/git-repack.txt | 8 ++++++++ builtin/pack-objects.c | 6 +----- builtin/repack.c | 22 +++++++++++++++------- t/t7700-repack.sh | 15 +++++++++++++++ 4 files changed, 39 insertions(+), 12 deletions(-) -- 2.38.1.145.g80cce38e46.dirty