Re: [PATCH v2 0/4] [RFC] repack: add --filter=

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Johannes

I'm not sure where I went wrong on GGG. Somehow the cc list didn't get translated into
cc fields. Here's the PR: https://github.com/git/git/pull/1206. Thanks!

cc'ing folks I meant to cc for this patch series

On 8 Feb 2022, at 21:10, John Cai via GitGitGadget wrote:

> This patch series makes partial clone more useful by making it possible to
> run repack to remove objects from a repository (replacing it with promisor
> objects). This is useful when we want to offload large blobs from a git
> server onto another git server, or even use an http server through a remote
> helper.
>
> In [A], a --refilter option on fetch and fetch-pack is being discussed where
> either a less restrictive or more restrictive filter can be used. In the
> more restrictive case, the objects that already exist will not be deleted.
> But, one can imagine that users might want the ability to delete objects
> when they apply a more restrictive filter in order to save space, and this
> patch series would also allow that.
>
> There are a couple of things we need to adjust to make this possible. This
> patch has three parts.
>
>  1. Allow --filter in pack-objects without --stdout
>  2. Add a --filter flag for repack
>  3. Allow missing promisor objects in upload-pack
>  4. Tests that demonstrate the ability to offload objects onto an http
>     remote
>
> cc: Christian Couder christian.couder@xxxxxxxxx cc: Derrick Stolee
> stolee@xxxxxxxxx cc: Robert Coup robert@xxxxxxxxxxx
>
> A.
> https://lore.kernel.org/git/pull.1138.git.1643730593.gitgitgadget@xxxxxxxxx/
>
> John Cai (4):
>   pack-objects: allow --filter without --stdout
>   repack: add --filter=<filter-spec> option
>   upload-pack: allow missing promisor objects
>   tests for repack --filter mode
>
>  Documentation/git-repack.txt   |   5 +
>  builtin/pack-objects.c         |   2 -
>  builtin/repack.c               |  22 +++--
>  t/lib-httpd.sh                 |   2 +
>  t/lib-httpd/apache.conf        |   8 ++
>  t/lib-httpd/list.sh            |  43 +++++++++
>  t/lib-httpd/upload.sh          |  46 +++++++++
>  t/t0410-partial-clone.sh       |  81 ++++++++++++++++
>  t/t0410/git-remote-testhttpgit | 170 +++++++++++++++++++++++++++++++++
>  t/t7700-repack.sh              |  20 ++++
>  upload-pack.c                  |   5 +
>  11 files changed, 395 insertions(+), 9 deletions(-)
>  create mode 100644 t/lib-httpd/list.sh
>  create mode 100644 t/lib-httpd/upload.sh
>  create mode 100755 t/t0410/git-remote-testhttpgit
>
>
> base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1206%2Fjohn-cai%2Fjc-repack-filter-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1206/john-cai/jc-repack-filter-v2
> Pull-Request: https://github.com/git/git/pull/1206
>
> Range-diff vs v1:
>
>  1:  0eec9b117da = 1:  f43b76ca650 pack-objects: allow --filter without --stdout
>  -:  ----------- > 2:  6e7c8410b8d repack: add --filter=<filter-spec> option
>  -:  ----------- > 3:  40612b9663b upload-pack: allow missing promisor objects
>  2:  a3166381572 ! 4:  d76faa1f16e repack: add --filter=<filter-spec> option
>      @@ Metadata
>       Author: John Cai <johncai86@xxxxxxxxx>
>
>        ## Commit message ##
>      -    repack: add --filter=<filter-spec> option
>      +    tests for repack --filter mode
>
>      -    Currently, repack does not work with partial clones. When repack is run
>      -    on a partially cloned repository, it grabs all missing objects from
>      -    promisor remotes. This also means that when gc is run for repository
>      -    maintenance on a partially cloned repository, it will end up getting
>      -    missing objects, which is not what we want.
>      -
>      -    In order to make repack work with partial clone, teach repack a new
>      -    option --filter, which takes a <filter-spec> argument. repack will skip
>      -    any objects that are matched by <filter-spec> similar to how the clone
>      -    command will skip fetching certain objects.
>      -
>      -    The final goal of this feature, is to be able to store objects on a
>      -    server other than the regular git server itself.
>      +    This patch adds tests to test both repack --filter functionality in
>      +    isolation (in t7700-repack.sh) as well as how it can be used to offload
>      +    large blobs (in t0410-partial-clone.sh)
>
>           There are several scripts added so we can test the process of using a
>      -    remote helper to upload blobs to an http server:
>      +    remote helper to upload blobs to an http server.
>
>           - t/lib-httpd/list.sh lists blobs uploaded to the http server.
>           - t/lib-httpd/upload.sh uploads blobs to the http server.
>      @@ Commit message
>           Based-on-patch-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
>           Signed-off-by: John Cai <johncai86@xxxxxxxxx>
>
>      - ## Documentation/git-repack.txt ##
>      -@@ Documentation/git-repack.txt: depth is 4095.
>      - 	a larger and slower repository; see the discussion in
>      - 	`pack.packSizeLimit`.
>      -
>      -+--filter=<filter-spec>::
>      -+	Omits certain objects (usually blobs) from the resulting
>      -+	packfile. See linkgit:git-rev-list[1] for valid
>      -+	`<filter-spec>` forms.
>      -+
>      - -b::
>      - --write-bitmap-index::
>      - 	Write a reachability bitmap index as part of the repack. This
>      -
>      - ## builtin/repack.c ##
>      -@@ builtin/repack.c: struct pack_objects_args {
>      - 	const char *depth;
>      - 	const char *threads;
>      - 	const char *max_pack_size;
>      -+	const char *filter;
>      - 	int no_reuse_delta;
>      - 	int no_reuse_object;
>      - 	int quiet;
>      -@@ builtin/repack.c: static void prepare_pack_objects(struct child_process *cmd,
>      - 		strvec_pushf(&cmd->args, "--threads=%s", args->threads);
>      - 	if (args->max_pack_size)
>      - 		strvec_pushf(&cmd->args, "--max-pack-size=%s", args->max_pack_size);
>      -+	if (args->filter)
>      -+		strvec_pushf(&cmd->args, "--filter=%s", args->filter);
>      - 	if (args->no_reuse_delta)
>      - 		strvec_pushf(&cmd->args, "--no-reuse-delta");
>      - 	if (args->no_reuse_object)
>      -@@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix)
>      - 				N_("limits the maximum number of threads")),
>      - 		OPT_STRING(0, "max-pack-size", &po_args.max_pack_size, N_("bytes"),
>      - 				N_("maximum size of each packfile")),
>      -+		OPT_STRING(0, "filter", &po_args.filter, N_("args"),
>      -+				N_("object filtering")),
>      - 		OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
>      - 				N_("repack objects in packs marked with .keep")),
>      - 		OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"),
>      -@@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix)
>      - 		if (line.len != the_hash_algo->hexsz)
>      - 			die(_("repack: Expecting full hex object ID lines only from pack-objects."));
>      - 		string_list_append(&names, line.buf);
>      -+		if (po_args.filter) {
>      -+			char *promisor_name = mkpathdup("%s-%s.promisor", packtmp,
>      -+							line.buf);
>      -+			write_promisor_file(promisor_name, NULL, 0);
>      -+		}
>      - 	}
>      - 	fclose(out);
>      - 	ret = finish_command(&cmd);
>      -
>        ## t/lib-httpd.sh ##
>       @@ t/lib-httpd.sh: prepare_httpd() {
>        	install_script error-smart-http.sh
>      @@ t/t0410-partial-clone.sh: test_expect_success 'fetching of missing objects from
>       +	git -C server rev-list --objects --all --missing=print >objects &&
>       +	grep "$sha" objects
>       +'
>      ++
>      ++test_expect_success 'fetch does not cause server to fetch missing objects' '
>      ++	rm -rf origin server client &&
>      ++	test_create_repo origin &&
>      ++	dd if=/dev/zero of=origin/file1 bs=801k count=1 &&
>      ++	git -C origin add file1 &&
>      ++	git -C origin commit -m "large blob" &&
>      ++	sha="$(git -C origin rev-parse :file1)" &&
>      ++	expected="?$(git -C origin rev-parse :file1)" &&
>      ++	git clone --bare --no-local origin server &&
>      ++	git -C server remote add httpremote "testhttpgit::${PWD}/server" &&
>      ++	git -C server config remote.httpremote.promisor true &&
>      ++	git -C server config --remove-section remote.origin &&
>      ++	git -C server rev-list --all --objects --filter-print-omitted \
>      ++		--filter=blob:limit=800k | perl -ne "print if s/^[~]//" \
>      ++		>large_blobs.txt &&
>      ++	upload_blobs_from_stdin server <large_blobs.txt &&
>      ++	git -C server -c repack.writebitmaps=false repack -a -d \
>      ++		--filter=blob:limit=800k &&
>      ++	git -C server config uploadpack.allowmissingpromisor true &&
>      ++	git clone -c remote.httpremote.url="testhttpgit::${PWD}/server" \
>      ++	-c remote.httpremote.fetch='+refs/heads/*:refs/remotes/httpremote/*' \
>      ++	-c remote.httpremote.promisor=true --bare --no-local \
>      ++	--filter=blob:limit=800k server client &&
>      ++	git -C client rev-list --objects --all --missing=print >client_objects &&
>      ++	grep "$expected" client_objects &&
>      ++	git -C server rev-list --objects --all --missing=print >server_objects &&
>      ++	grep "$expected" server_objects
>      ++'
>       +
>        # DO NOT add non-httpd-specific tests here, because the last part of this
>        # test script is only executed when httpd is available and enabled.
>
> -- 
> gitgitgadget




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux