Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
> 
> > When fetching (as opposed to cloning) from a repository with packfile
> > URIs enabled, an error like this may occur:
> >
> >  fatal: pack has bad object at offset 12: unknown object type 5
> >  fatal: finish_http_pack_request gave result -1
> >  fatal: fetch-pack: expected keep then TAB at start of http-fetch output
> >
> > This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs,
> > use index-pack arg", 2021-02-22), when the index-pack args used when
> > processing the inline packfile of a fetch response and when processing
> > packfile URIs were unified.
> 
> > This bug happens because fetch, by default, partially reads (and
> > consumes) the header of the inline packfile to determine if it should
> > store the downloaded objects as a packfile or loose objects, and thus
> > passes --pack_header=<...> to index-pack to inform it that some bytes
> > are missing. 
> 
> ... and what the values in them are.

Ah, that's true.

> > However, when it subsequently fetches the additional
> > packfiles linked by URIs, it reuses the same index-pack arguments, thus
> > wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are
> > missing.
> >
> > This does not happen when cloning because "git clone" always passes
> > do_keep, which instructs the fetch mechanism to always retain the
> > packfile, eliminating the need to read the header.
> >
> > There are a few ways to fix this, including filtering out pack_header
> > arguments when downloading the additional packfiles, but ...
> 
> Avoiding the condition that exhibits the breakage is possible, and I
> think it is what is done here, but I actually think that the only
> right fix is to pass correct argument to commands we invoke in the
> first place.  Why are we reusing the same argument array to begin
> with?
> 
>     ... goes back and reads the offending commit ...
> 
> commit b664e9ffa153189dae9b88f32d1c5fedcf85056a
> Author: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> Date:   Mon Feb 22 11:20:08 2021 -0800
> 
>     fetch-pack: with packfile URIs, use index-pack arg
>     
>     Unify the index-pack arguments used when processing the inline pack and
>     when downloading packfiles referenced by URIs. This is done by teaching
>     get_pack() to also store the index-pack arguments whenever at least one
>     packfile URI is given, and then when processing the packfile URI(s),
>     using the stored arguments.
> 
> THis makes it sound like the entire idea of this offending commit
> was wrong, and before it, the codepath that processed the packfile
> fetched from the packfile URI were using the index-pack correctly
> by using index-pack arguments that are independent from the one that
> is used to process the packfile given in-stream.  Why isn't the fix
> just a straight revert of the commit???

I should probably have written more in the commit message to justify the
unification, but it is also part of a bug fix (in particular,
--fsck-objects wasn't being passed to the index-pack that indexed the
packfiles linked by URI) and for code health purposes (to prevent future
bugs by eliminating the divergence). So reverting that commit would
reintroduce another bug.

> > @@ -885,7 +885,7 @@ static int get_pack(struct fetch_pack_args *args,
> >  			strvec_push(&cmd.args, "-v");
> >  		if (args->use_thin_pack)
> >  			strvec_push(&cmd.args, "--fix-thin");
> > -		if (do_keep && (args->lock_pack || unpack_limit)) {
> > +		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
> >  			char hostname[HOST_NAME_MAX + 1];
> >  			if (xgethostname(hostname, sizeof(hostname)))
> >  				xsnprintf(hostname, sizeof(hostname), "localhost");
> 
> I do not quite get what this hunk is doing.  Care to explain?

The "do_keep" part was unnecessarily restrictive and I used a band-aid
solution to loosen it. I think this started from 88e2f9ed8e ("introduce
fetch-object: fetch one promisor object", 2017-12-05) where I might have
misunderstood what do_keep was meant to do, and taught fetch-pack to use
"index-pack" if do_keep is true or args->from_promisor is true. What I
should have done is to set do_keep to true if args->from_promisor is
true. Future commits continued to do that with fsck_objects and
index_pack_args.

Maybe what I can do is to refactor get_pack() so that do_keep retains
its original meaning of whether to use "index-pack" or "unpack-objects",
and then we wouldn't need this line. What do you think (code-wise and
whether this fits in with the release schedule, if we want to get this
in before release)?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux