Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:

> When fetching (as opposed to cloning) from a repository with packfile
> URIs enabled, an error like this may occur:
>
>  fatal: pack has bad object at offset 12: unknown object type 5
>  fatal: finish_http_pack_request gave result -1
>  fatal: fetch-pack: expected keep then TAB at start of http-fetch output
>
> This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs,
> use index-pack arg", 2021-02-22), when the index-pack args used when
> processing the inline packfile of a fetch response and when processing
> packfile URIs were unified.

> This bug happens because fetch, by default, partially reads (and
> consumes) the header of the inline packfile to determine if it should
> store the downloaded objects as a packfile or loose objects, and thus
> passes --pack_header=<...> to index-pack to inform it that some bytes
> are missing. 

... and what the values in them are.

> However, when it subsequently fetches the additional
> packfiles linked by URIs, it reuses the same index-pack arguments, thus
> wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are
> missing.
>
> This does not happen when cloning because "git clone" always passes
> do_keep, which instructs the fetch mechanism to always retain the
> packfile, eliminating the need to read the header.
>
> There are a few ways to fix this, including filtering out pack_header
> arguments when downloading the additional packfiles, but ...

Avoiding the condition that exhibits the breakage is possible, and I
think it is what is done here, but I actually think that the only
right fix is to pass correct argument to commands we invoke in the
first place.  Why are we reusing the same argument array to begin
with?

    ... goes back and reads the offending commit ...

commit b664e9ffa153189dae9b88f32d1c5fedcf85056a
Author: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
Date:   Mon Feb 22 11:20:08 2021 -0800

    fetch-pack: with packfile URIs, use index-pack arg
    
    Unify the index-pack arguments used when processing the inline pack and
    when downloading packfiles referenced by URIs. This is done by teaching
    get_pack() to also store the index-pack arguments whenever at least one
    packfile URI is given, and then when processing the packfile URI(s),
    using the stored arguments.

THis makes it sound like the entire idea of this offending commit
was wrong, and before it, the codepath that processed the packfile
fetched from the packfile URI were using the index-pack correctly
by using index-pack arguments that are independent from the one that
is used to process the packfile given in-stream.  Why isn't the fix
just a straight revert of the commit???

> This is on jt/transfer-fsck-across-packs.

Ouch.  This definitely is an -rc material.


> -	if (!args->keep_pack && unpack_limit) {
> +	if (!args->keep_pack && unpack_limit && !index_pack_args) {

This one makes sense as an "avoid conditions that reveals how badly
the code is broken" band-aid.  When we have index-pack related
arguments, we cannot use the unpack-objects codepath even if we are
being fed a tiny pack, so there is no point peeking at the beginning
of the pack stream to find out how many objects it has.  OK.

> @@ -885,7 +885,7 @@ static int get_pack(struct fetch_pack_args *args,
>  			strvec_push(&cmd.args, "-v");
>  		if (args->use_thin_pack)
>  			strvec_push(&cmd.args, "--fix-thin");
> -		if (do_keep && (args->lock_pack || unpack_limit)) {
> +		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
>  			char hostname[HOST_NAME_MAX + 1];
>  			if (xgethostname(hostname, sizeof(hostname)))
>  				xsnprintf(hostname, sizeof(hostname), "localhost");

I do not quite get what this hunk is doing.  Care to explain?

Thanks.

> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> index b1bc73a9a9..9df1ec82ca 100755
> --- a/t/t5702-protocol-v2.sh
> +++ b/t/t5702-protocol-v2.sh
> @@ -853,6 +853,27 @@ test_expect_success 'part of packfile response provided as URI' '
>  	test_line_count = 6 filelist
>  '
>  
> +test_expect_success 'packfile URIs with fetch instead of clone' '
> +	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +	rm -rf "$P" http_child log &&
> +
> +	git init "$P" &&
> +	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
> +
> +	echo my-blob >"$P/my-blob" &&
> +	git -C "$P" add my-blob &&
> +	git -C "$P" commit -m x &&
> +
> +	configure_exclusion "$P" my-blob >h &&
> +
> +	git init http_child &&
> +
> +	GIT_TEST_SIDEBAND_ALL=1 \
> +	git -C http_child -c protocol.version=2 \
> +		-c fetch.uriprotocols=http,https \
> +		fetch "$HTTPD_URL/smart/http_parent"
> +'
> +
>  test_expect_success 'fetching with valid packfile URI but invalid hash fails' '
>  	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
>  	rm -rf "$P" http_child log &&



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux