Re: [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 08 Jun 2021 13:33:32 +0900

Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:

>  		/* Check if it is a missing object */
> -		if (fetch_if_missing && has_promisor_remote() &&
> -		    !already_retried && r == the_repository &&
> +		if (fetch_if_missing && repo_has_promisor_remote(r) &&
> +		    !already_retried &&

Turning has_promisor_remote() into repo_has_promisor_remote(r) does
make tons of sense.  Is this part of the code ready to lose "'r' must
be the_repository because has_promisor_remote() only works on the
primary in-core repository" we had before?

> @@ -21,6 +22,11 @@ static int fetch_objects(const char *remote_name,
>  
>  	child.git_cmd = 1;
>  	child.in = -1;
> +	if (repo != the_repository) {
> +		prepare_other_repo_env(&child.env_array);
> +		strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT,
> +			     repo->gitdir);
> +	}

This is what prepare_submodule_repo_env_in_gitdir() does; it makes
me wonder if it (i.e. set up environment for that other repository,
including the GIT_DIR and possibly other per-repository environment
variable override) should be the primary API callers would want,
instead of a more limited prepare_other_repo_env() that does not
even take 'repo' parameter.  Doesn't it feel somewhat strange for a
function that is supposed to help preparing a part of child process
by filling appropriate environ[] array to be run in a repository
that is different from ours (which is "other repo" part of its name)
not to want to even know which repository the "other" repo is?

> diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
> new file mode 100644
> index 0000000000..3f102cfddd
> --- /dev/null
> +++ b/t/helper/test-partial-clone.c
> @@ -0,0 +1,43 @@
> +#include "cache.h"
> +#include "test-tool.h"
> +#include "repository.h"
> +#include "object-store.h"
> +
> +/*
> + * Prints the size of the object corresponding to the given hash in a specific
> + * gitdir. This is similar to "git -C gitdir cat-file -s", except that this
> + * exercises the code that accesses the object of an arbitrary repository that
> + * is not the_repository. ("git -C gitdir" makes it so that the_repository is
> + * the one in gitdir.)
> + */

The reason why this only gives size is because it will eventually
become unnecessary once the main code starts running things in a
submodule repository properly (i.e. without doing the alternate odb
thing), and a more elaborate check is not worth your engineering
effort?  Object type and object sizes are something that you can
safely express in plain text, would be handy for testing, and would
not require too much extra code, I'd imagine.

> +static void object_info(const char *gitdir, const char *oid_hex)
> +{
> +	struct repository r;
> +	struct object_id oid;
> +	unsigned long size;
> +	struct object_info oi = {.sizep = &size};
> +	const char *p;
> +
> +	if (repo_init(&r, gitdir, NULL))
> +		die("could not init repo");
> +	if (parse_oid_hex(oid_hex, &oid, &p))
> +		die("could not parse oid");
> +	if (oid_object_info_extended(&r, &oid, &oi, 0))
> +		die("could not obtain object info");
> +	printf("%d\n", (int) size);

Mimicking what builtin/cat-file.c::cat_one_file() does, for example, and
using

	printf("%"PRIuMAX"\n", (uintmax_t)size);

might be better (I was wondering if we can extract reusable helpers,
but I do not think that is worth doing, if this is meant to be
temporary stop-gap measure).

Thanks.