Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> I was looking at the index_pack_args vs pass_header codepath in
> fetch-pack.c again after finishing the -rc2 stuff, and noticed
> something curious.
> 
> Before running the command to process in-stream packdata, we have
> this bit:
> 
> 	if (index_pack_args) {
> 		int i;
> 
> 		for (i = 0; i < cmd.args.nr; i++)
> 			strvec_push(index_pack_args, cmd.args.v[i]);
> 	}
> 
> where cmd.args is what the original code (before the "we need to
> prepare the index pack arguments for the offline HTTP transfer"
> logic was bolted onto this codepath), so it could of course have
> things like "--fix-thin", "--promisor", when we are processing an
> in-stream packfile that has sufficiently large number of objects and
> choose "index-pack" to process it.  None of them should be given to
> the "index-pack" that processes the offline packfile that is given
> via the packfile URI mechanism.

Thanks for continuing to take a look at this. My thinking is that all
packfiles (inline or through URI) should be processed in as similar a
manner as possible. Looking at the potential arguments passed to
index-pack:

 1.  --shallow-file (before "index-pack", that is, an argument passed to
     "git" itself and not the subcommand)
 2.  index-pack
 3.  --stdin
 4.  -v
 5.  --fix-thin
 6.  --keep
 7.  [--check-self-contained-and-connected is guarded by !index_pack_args
     so we won't be passing it]
 8.  --promisor
 9.  --pack_header
 10. --fsck_objects
 11. [--strict appears in an "else" block opposite index_pack_args so we
     won't be passing it]

You mentioned --fix-thin (5) and --promisor (8). Why do you think that
none of these should be given to the "index-pack" that processes the
packfiles given by URI? Perhaps it could be argued that these extra
packfiles don't need --fix-thin (but I would say that I think servers
should be allowed to serve thin packfiles through URI too), but I think
that --promisor is necessary (so that a server could, for example,
offload all trees and commits to a packfile in a CDN, and offload all
blobs to a separate packfile in a CDN).

Looking at this list, I think that all the arguments (except 9, which
has been fixed) are necessary (or at least useful) for indexing a
packfile given by URI.

> Also, because this loop copies everything in cmd.args, if our
> in-stream packdata is small, cmd.args.v[0] would be "unpack-objects",
> and we end up asking the command to explode the (presumably large
> enough to be worth pre-generating and serving via CDN) packfile that
> is given via the packfile URI mechanism.

I specifically guard against this through the "if (do_keep ||
args->from_promisor || index_pack_args || fsck_objects) {" line (which
is a complicated line, unfortunately).

> What I think I am seeing in the code is that there are many things
> other than "pass_header" that fundamentally cannot be reused between
> the processing of the in-stream packdata and the offline packfile
> given by the packfile URI (e.g. the in-stream one may want to use
> "unpack-objects" to avoid accumulating too many tiny packs, so there
> is nothing to be shared with "index-pack" that will always be used
> for the offline one), and any attempt to "reuse" cmd.args while
> "filtering out" inappropriate bits is fragile and unfruitful.
>
> Instead, I think we should not touch index_pack in the earlier part
> of the function at all (both reading, writing, or even checking for
> NULL-ness), and use the "if (index_pack_args)" block we already have
> (i.e. the one before we call start_command() to process the
> in-stream packdata) to decide what the command line to process the
> offline pack should look like.  That way, we won't ever risk such a
> confusion like running "unpack-objects" instead of "index-pack" (but
> we can choose to do so deliberately, of course---the important point
> is to recognise that the in-stream pack and the offline one are
> independant and we should decide how to cook them separately).

We could do that, although I'm concerned that we would be repeating
logic a lot (deciding whether or not to pass an argument). One other
approach is for each "strvec.push?(&cmd.args" to also have another line
that pushes to index_pack_args if it's relevant. But as I said earlier,
I think that all or nearly all arguments will be relevant to both.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux