Re: [PATCH v3 3/3] object-info: add option for retrieving object info

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 29 Mar 2022 15:54:16 -0700

Calvin Wan <calvinwan@xxxxxxxxxx> writes:

> Add ‘--object-info’ option to fetch. This option allows the client to
> make an object-info command request to a server that supports protocol
> v2. If the server is v2, but does not allow for the object-info
> command request, fetch the objects as if it were a standard fetch
> (however without changing any refs). Therefore, hook `fetch
> object-info` into transport_fetch_refs() to easily fallback if needed.

Sorry, but I do not see anything common between "git fetch" that
fetches history and associated objects and "retrieving only object
info".  Other than "the way I happened to have chosen to implement,
what is used to implement fetch has the most commonality".

If we were to add more "server offloading", say "run blame on the
server side", are we going to piggy back over fetch-pack, too?

It is not necessarily bad way to experiment, to reuse the code to
establish connection, check if the other side is capable to serve
us, throw args at them and then retrieve the result, but exposing
that implemention detail makes a HORRIBLE UX to force users to say
"git fetch --blame='master Makefile' origin".

Shouldn't we be making the part that we truly need to reuse into a
separate API out of fetch-pack and either (1) allow new commands be
easily written reusing the code to create "git remote-object-info"
and "git remote-blame", or (2) come up with a single "do things on
remote" command with various subcommands, i.e. "git over-there
object-info" and "git over-there blame"?

> A previous patch added the `transfer.advertiseObjectInfo` config
> option, of which this patch utilizes to test the fallback.
>
> ---

Missing sign-off.

> diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt
> index 550c16ca61..a13d2ba7ad 100644
> --- a/Documentation/git-fetch.txt
> +++ b/Documentation/git-fetch.txt
> @@ -13,6 +13,7 @@ SYNOPSIS
>  'git fetch' [<options>] <group>
>  'git fetch' --multiple [<options>] [(<repository> | <group>)...]
>  'git fetch' --all [<options>]
> +'git fetch' --object-info=[<arguments>] <remote> [<object-ids>]

This is a start of slippery slope of making "fetch" a kitchen sink
that does not have much to do with "git fetch".  Let's not go this
route.

> @@ -2087,6 +2094,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  	struct remote *remote = NULL;
>  	int result = 0;
>  	int prune_tags_ok = 1;
> +	struct oid_array object_info_oids = OID_ARRAY_INIT;
>  
>  	packet_trace_identity("fetch");
>  
> @@ -2168,7 +2176,19 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  	if (dry_run)
>  		write_fetch_head = 0;
>  
> -	if (all) {
> +	if (object_info_options.nr > 0) {
> +		if (argc == 0 || argc == 1) {
> +			die(_("must supply remote and object ids when using --object-info"));
> +		} else {
> +			struct object_id oid;
> +			remote = remote_get(argv[0]);
> +			for (i = 1; i < argc; i++) {
> +				if (get_oid(argv[i], &oid))
> +					return error(_("malformed object id '%s'"), argv[i]);
> +				oid_array_append(&object_info_oids, &oid);
> +			}
> +		}
> +	} else if (all) {
>  		if (argc == 1)
>  			die(_("fetch --all does not take a repository argument"));
>  		else if (argc > 1)

Yuck.  Let's read this again, realize how little commonality we have
in processing args and barf.  That's all because we tried to
piggy-back on "fetch" because the patch wanted to reuse the parts
that are not shown in the patch text.  The right way to do so would
be to factor that "part that do not appear in the patch, because
they are common" out into callable from new code.  That way, we do
not have to contaminate this if/else conditional, both arms of which
is still about running "fetch" to retrieve history and connected
objects, and has nothing to do with "fetch object info".  The way
this patch changes this part of the code is an unnecessary risk to
break the normal "git fetch", and it does not have to be.

I like the idea to give client side a way to ask server to do random
things.  I do not think I want to see unrelated things piggy-backing
on existing client programs only because that is an expedient way to
implement it.