Re: [PATCH] fetch: remove fetch_if_missing=0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Jonathan Tan wrote:

> In fetch_pack() (and all functions it calls), pass
> OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
> a tree or blob that we do not want to be lazy-fetched even if it is
> absent. Thus, the only lazy-fetches occurring for trees and blobs are
> when resolving deltas.
>
> Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
> this, and also add a test ensuring that such objects are not
> lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
> places too, but I have limited myself to builtin/fetch.c in this commit
> because I have not written tests for the other commands yet.)

Hooray!  Thanks much, this looks easier to maintain.

> Note that commits and tags may still be lazy-fetched. I limited myself
> to objects that could be trees or blobs here because Git does not
> support creating such commit- and tag-excluding clones yet, and even if
> such a clone were manually created, Git does not have good support for
> fetching a single commit (when fetching a commit, it and all its
> ancestors would be sent).

Is there a place we could put a NEEDSWORK comment to avoid confusion
when debugging if this gets introduced later?

Even if not, this seems like a sensible choice.

> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> ---
> I've verified that this also solves the bug explained in:
> https://public-inbox.org/git/20191007181825.13463-1-jonathantanmy@xxxxxxxxxx/

Might be worth mentioning the example from there in the commit message
as well, to help explain the context behind the change.

I would still be in favor of applying that more conservative change to
"master", even this late in the -rc cycle.

[...]
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
>  	 * we need all direct targets to exist.
>  	 */
>  	for (r = rm; r; r = r->next) {
> -		if (!has_object_file(&r->old_oid))
> +		if (!has_object_file_with_flags(&r->old_oid,
> +						OBJECT_INFO_SKIP_FETCH_OBJECT))

Yes.

[...]
> @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  
>  	packet_trace_identity("fetch");
>  
> -	fetch_if_missing = 0;
> -

This is the scary part, but in an "uncomfortably exciting" sense rather
than a worrying one.  Thanks for adding a test.

[...]
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  		struct object *o;
>  
>  		if (!has_object_file_with_flags(&ref->old_oid,
> -						OBJECT_INFO_QUICK))
> +						OBJECT_INFO_QUICK |
> +							OBJECT_INFO_SKIP_FETCH_OBJECT))

Should we make OBJECT_INFO_QUICK always imply
OBJECT_INFO_SKIP_FETCH_OBJECT?  I would suspect that if we are willing to
avoid checking thoroughly locally, checking remotely would be even more
undesirable.

[...]
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -296,6 +296,75 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
>  	test_i18ngrep "unable to parse sparse filter data in" err
>  '
>  
> +setup_triangle () {
> +	rm -rf big-blob.txt server client promisor-remote &&
> +
> +	touch big-blob.txt &&

Tests seem to prefer spelling this as

	>big-blob.txt &&

because that specifes the content of the file.

> +	for i in $(seq 1 100)
> +	do
> +		echo line $i >>big-blob.txt
> +	done &&

Should this use test_seq for better portability?

nit: can avoid a subshell:

	test_seq 1 100 | sed -e 's/^/line /' >big-blob.txt

[...]
> +test_expect_success 'fetch lazy-fetches only to resolve deltas' '
> +	setup_triangle &&
> +
> +	# Exercise to make sure it works. Git will not fetch anything from the
> +	# promisor remote other than for the big blob (because it needs to
> +	# resolve the delta).
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
> +		fetch "file://$(pwd)/server" master &&
> +
> +	# Verify the assumption that the client needed to fetch the delta base
> +	# to resolve the delta.
> +	git hash-object big-blob.txt >hash &&
> +	grep "want $(cat hash)" trace

nit: can avoid using cat:

	hash=$(git hash-object big-blob.txt) &&
	grep "want $hash" trace

Thanks and hope that helps,
Jonathan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux