> I was looking at the index_pack_args vs pass_header codepath in > fetch-pack.c again after finishing the -rc2 stuff, and noticed > something curious. > > Before running the command to process in-stream packdata, we have > this bit: > > if (index_pack_args) { > int i; > > for (i = 0; i < cmd.args.nr; i++) > strvec_push(index_pack_args, cmd.args.v[i]); > } > > where cmd.args is what the original code (before the "we need to > prepare the index pack arguments for the offline HTTP transfer" > logic was bolted onto this codepath), so it could of course have > things like "--fix-thin", "--promisor", when we are processing an > in-stream packfile that has sufficiently large number of objects and > choose "index-pack" to process it. None of them should be given to > the "index-pack" that processes the offline packfile that is given > via the packfile URI mechanism. Thanks for continuing to take a look at this. My thinking is that all packfiles (inline or through URI) should be processed in as similar a manner as possible. Looking at the potential arguments passed to index-pack: 1. --shallow-file (before "index-pack", that is, an argument passed to "git" itself and not the subcommand) 2. index-pack 3. --stdin 4. -v 5. --fix-thin 6. --keep 7. [--check-self-contained-and-connected is guarded by !index_pack_args so we won't be passing it] 8. --promisor 9. --pack_header 10. --fsck_objects 11. [--strict appears in an "else" block opposite index_pack_args so we won't be passing it] You mentioned --fix-thin (5) and --promisor (8). Why do you think that none of these should be given to the "index-pack" that processes the packfiles given by URI? Perhaps it could be argued that these extra packfiles don't need --fix-thin (but I would say that I think servers should be allowed to serve thin packfiles through URI too), but I think that --promisor is necessary (so that a server could, for example, offload all trees and commits to a packfile in a CDN, and offload all blobs to a separate packfile in a CDN). Looking at this list, I think that all the arguments (except 9, which has been fixed) are necessary (or at least useful) for indexing a packfile given by URI. > Also, because this loop copies everything in cmd.args, if our > in-stream packdata is small, cmd.args.v[0] would be "unpack-objects", > and we end up asking the command to explode the (presumably large > enough to be worth pre-generating and serving via CDN) packfile that > is given via the packfile URI mechanism. I specifically guard against this through the "if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {" line (which is a complicated line, unfortunately). > What I think I am seeing in the code is that there are many things > other than "pass_header" that fundamentally cannot be reused between > the processing of the in-stream packdata and the offline packfile > given by the packfile URI (e.g. the in-stream one may want to use > "unpack-objects" to avoid accumulating too many tiny packs, so there > is nothing to be shared with "index-pack" that will always be used > for the offline one), and any attempt to "reuse" cmd.args while > "filtering out" inappropriate bits is fragile and unfruitful. > > Instead, I think we should not touch index_pack in the earlier part > of the function at all (both reading, writing, or even checking for > NULL-ness), and use the "if (index_pack_args)" block we already have > (i.e. the one before we call start_command() to process the > in-stream packdata) to decide what the command line to process the > offline pack should look like. That way, we won't ever risk such a > confusion like running "unpack-objects" instead of "index-pack" (but > we can choose to do so deliberately, of course---the important point > is to recognise that the in-stream pack and the offline one are > independant and we should decide how to cook them separately). We could do that, although I'm concerned that we would be repeating logic a lot (deciding whether or not to pass an argument). One other approach is for each "strvec.push?(&cmd.args" to also have another line that pushes to index_pack_args if it's relevant. But as I said earlier, I think that all or nearly all arguments will be relevant to both.