This patch series makes partial clone more useful by making it possible to run repack to remove objects from a repository (replacing it with promisor objects). This is useful when we want to offload large blobs from a git server onto another git server, or even use an http server through a remote helper. In [A], a --refilter option on fetch and fetch-pack is being discussed where either a less restrictive or more restrictive filter can be used. In the more restrictive case, the objects that already exist will not be deleted. But, one can imagine that users might want the ability to delete objects when they apply a more restrictive filter in order to save space, and this patch series would also allow that. There are a couple of things we need to adjust to make this possible. This patch has three parts. 1. Allow --filter in pack-objects without --stdout 2. Add a --filter flag for repack 3. Allow missing promisor objects in upload-pack 4. Tests that demonstrate the ability to offload objects onto an http remote cc: Christian Couder christian.couder@xxxxxxxxx cc: Derrick Stolee stolee@xxxxxxxxx cc: Robert Coup robert@xxxxxxxxxxx A. https://lore.kernel.org/git/pull.1138.git.1643730593.gitgitgadget@xxxxxxxxx/ John Cai (4): pack-objects: allow --filter without --stdout repack: add --filter=<filter-spec> option upload-pack: allow missing promisor objects tests for repack --filter mode Documentation/git-repack.txt | 5 + builtin/pack-objects.c | 2 - builtin/repack.c | 22 +++-- t/lib-httpd.sh | 2 + t/lib-httpd/apache.conf | 8 ++ t/lib-httpd/list.sh | 43 +++++++++ t/lib-httpd/upload.sh | 46 +++++++++ t/t0410-partial-clone.sh | 81 ++++++++++++++++ t/t0410/git-remote-testhttpgit | 170 +++++++++++++++++++++++++++++++++ t/t7700-repack.sh | 20 ++++ upload-pack.c | 5 + 11 files changed, 395 insertions(+), 9 deletions(-) create mode 100644 t/lib-httpd/list.sh create mode 100644 t/lib-httpd/upload.sh create mode 100755 t/t0410/git-remote-testhttpgit base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1206%2Fjohn-cai%2Fjc-repack-filter-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1206/john-cai/jc-repack-filter-v2 Pull-Request: https://github.com/git/git/pull/1206 Range-diff vs v1: 1: 0eec9b117da = 1: f43b76ca650 pack-objects: allow --filter without --stdout -: ----------- > 2: 6e7c8410b8d repack: add --filter=<filter-spec> option -: ----------- > 3: 40612b9663b upload-pack: allow missing promisor objects 2: a3166381572 ! 4: d76faa1f16e repack: add --filter=<filter-spec> option @@ Metadata Author: John Cai <johncai86@xxxxxxxxx> ## Commit message ## - repack: add --filter=<filter-spec> option + tests for repack --filter mode - Currently, repack does not work with partial clones. When repack is run - on a partially cloned repository, it grabs all missing objects from - promisor remotes. This also means that when gc is run for repository - maintenance on a partially cloned repository, it will end up getting - missing objects, which is not what we want. - - In order to make repack work with partial clone, teach repack a new - option --filter, which takes a <filter-spec> argument. repack will skip - any objects that are matched by <filter-spec> similar to how the clone - command will skip fetching certain objects. - - The final goal of this feature, is to be able to store objects on a - server other than the regular git server itself. + This patch adds tests to test both repack --filter functionality in + isolation (in t7700-repack.sh) as well as how it can be used to offload + large blobs (in t0410-partial-clone.sh) There are several scripts added so we can test the process of using a - remote helper to upload blobs to an http server: + remote helper to upload blobs to an http server. - t/lib-httpd/list.sh lists blobs uploaded to the http server. - t/lib-httpd/upload.sh uploads blobs to the http server. @@ Commit message Based-on-patch-by: Christian Couder <chriscool@xxxxxxxxxxxxx> Signed-off-by: John Cai <johncai86@xxxxxxxxx> - ## Documentation/git-repack.txt ## -@@ Documentation/git-repack.txt: depth is 4095. - a larger and slower repository; see the discussion in - `pack.packSizeLimit`. - -+--filter=<filter-spec>:: -+ Omits certain objects (usually blobs) from the resulting -+ packfile. See linkgit:git-rev-list[1] for valid -+ `<filter-spec>` forms. -+ - -b:: - --write-bitmap-index:: - Write a reachability bitmap index as part of the repack. This - - ## builtin/repack.c ## -@@ builtin/repack.c: struct pack_objects_args { - const char *depth; - const char *threads; - const char *max_pack_size; -+ const char *filter; - int no_reuse_delta; - int no_reuse_object; - int quiet; -@@ builtin/repack.c: static void prepare_pack_objects(struct child_process *cmd, - strvec_pushf(&cmd->args, "--threads=%s", args->threads); - if (args->max_pack_size) - strvec_pushf(&cmd->args, "--max-pack-size=%s", args->max_pack_size); -+ if (args->filter) -+ strvec_pushf(&cmd->args, "--filter=%s", args->filter); - if (args->no_reuse_delta) - strvec_pushf(&cmd->args, "--no-reuse-delta"); - if (args->no_reuse_object) -@@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix) - N_("limits the maximum number of threads")), - OPT_STRING(0, "max-pack-size", &po_args.max_pack_size, N_("bytes"), - N_("maximum size of each packfile")), -+ OPT_STRING(0, "filter", &po_args.filter, N_("args"), -+ N_("object filtering")), - OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects, - N_("repack objects in packs marked with .keep")), - OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"), -@@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix) - if (line.len != the_hash_algo->hexsz) - die(_("repack: Expecting full hex object ID lines only from pack-objects.")); - string_list_append(&names, line.buf); -+ if (po_args.filter) { -+ char *promisor_name = mkpathdup("%s-%s.promisor", packtmp, -+ line.buf); -+ write_promisor_file(promisor_name, NULL, 0); -+ } - } - fclose(out); - ret = finish_command(&cmd); - ## t/lib-httpd.sh ## @@ t/lib-httpd.sh: prepare_httpd() { install_script error-smart-http.sh @@ t/t0410-partial-clone.sh: test_expect_success 'fetching of missing objects from + git -C server rev-list --objects --all --missing=print >objects && + grep "$sha" objects +' ++ ++test_expect_success 'fetch does not cause server to fetch missing objects' ' ++ rm -rf origin server client && ++ test_create_repo origin && ++ dd if=/dev/zero of=origin/file1 bs=801k count=1 && ++ git -C origin add file1 && ++ git -C origin commit -m "large blob" && ++ sha="$(git -C origin rev-parse :file1)" && ++ expected="?$(git -C origin rev-parse :file1)" && ++ git clone --bare --no-local origin server && ++ git -C server remote add httpremote "testhttpgit::${PWD}/server" && ++ git -C server config remote.httpremote.promisor true && ++ git -C server config --remove-section remote.origin && ++ git -C server rev-list --all --objects --filter-print-omitted \ ++ --filter=blob:limit=800k | perl -ne "print if s/^[~]//" \ ++ >large_blobs.txt && ++ upload_blobs_from_stdin server <large_blobs.txt && ++ git -C server -c repack.writebitmaps=false repack -a -d \ ++ --filter=blob:limit=800k && ++ git -C server config uploadpack.allowmissingpromisor true && ++ git clone -c remote.httpremote.url="testhttpgit::${PWD}/server" \ ++ -c remote.httpremote.fetch='+refs/heads/*:refs/remotes/httpremote/*' \ ++ -c remote.httpremote.promisor=true --bare --no-local \ ++ --filter=blob:limit=800k server client && ++ git -C client rev-list --objects --all --missing=print >client_objects && ++ grep "$expected" client_objects && ++ git -C server rev-list --objects --all --missing=print >server_objects && ++ grep "$expected" server_objects ++' + # DO NOT add non-httpd-specific tests here, because the last part of this # test script is only executed when httpd is available and enabled. -- gitgitgadget