Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: > +'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin | --packfile | <commit>] <url> > ... > +--packfile:: > + Instead of a commit id on the command line (which is not expected in > + this case), 'git http-fetch' fetches the packfile directly at the given > + URL and uses index-pack to generate corresponding .idx and .keep files. > + The output of index-pack is printed to stdout. This makes sense as an external interface, I guess. How should this interact with --stdin option, which is more like "instead of getting a single <dest filename, object name> pair from the command line, handle many pairs read from the standard input" batch mode operation. Would it be beneficial to allow unbounded number of packfiles, not just a single one, to be fetched and indexed by a single invocation of the command? I suspect that given the relatively large size of a single request for fetching a packfile, one invocation of the command per packfile won't be too heavy an overhead, so lack of such an orthogonality may only hurt conceptual cleanliness, but not performance. OK. > - if (argc != arg + 2 - commits_on_stdin) > + if (argc != arg + 2 - (commits_on_stdin || packfile)) > usage(http_fetch_usage); > if (commits_on_stdin) { > commits = walker_targets_stdin(&commit_id, &write_ref); > + } else if (packfile) { > + /* URL will be set later */ Prefer to see an empty statement spelled more explicitly, like this: ; /* URL will be set later */ Otherwise reader would be left wondering if a line was (or lines were) accidentally lost after this comment. > } else { > commit_id = (char **) &argv[arg++]; > commits = 1; > } > > + if (packfile) { > + url = xstrdup(argv[arg]); > + } else { > + if (argv[arg]) > + str_end_url_with_slash(argv[arg], &url); > + } > > setup_git_directory(); > > git_config(git_default_config, NULL); > > http_init(NULL, url, 0); > + if (packfile) { > + struct http_pack_request *preq; > + struct slot_results results; > + int ret; > + > + preq = new_http_pack_request(NULL, url); > + if (preq == NULL) > + die("couldn't create http pack request"); > + preq->slot->results = &results; > + preq->generate_keep = 1; > + > + if (start_active_slot(preq->slot)) { > + run_active_slot(preq->slot); > + if (results.curl_result != CURLE_OK) { > + die("Unable to get pack file %s\n%s", preq->url, > + curl_errorstr); > + } > + } else { > + die("Unable to start request"); > + } > + > + if ((ret = finish_http_pack_request(preq))) > + die("finish_http_pack_request gave result %d", ret); > + release_http_pack_request(preq); > + rc = 0; The above probably want to be a single helper function. The other side of if/else may also become another helper function. That way, the flow of control would become clearer. After all, these two branches do not share all that much. Only http-init and http-cleanup and nothing else. For that matter, even before introducing this new mode of operation, another patch to make a preparatory move of the original logic in this function to a helper function that would be called from the "else" side may make it easier to see what is going on. > diff --git a/http.c b/http.c > index 130e9d6259..ac66215ee6 100644 > --- a/http.c > +++ b/http.c > @@ -2280,15 +2280,18 @@ int finish_http_pack_request(struct http_pack_request *preq) > int tmpfile_fd; > int ret = 0; > > - close_pack_index(p); > + if (p) > + close_pack_index(p); > > fclose(preq->packfile); > preq->packfile = NULL; > > - lst = preq->lst; > - while (*lst != p) > - lst = &((*lst)->next); > - *lst = (*lst)->next; > + if (p) { > + lst = preq->lst; > + while (*lst != p) > + lst = &((*lst)->next); > + *lst = (*lst)->next; > + } This is quite ugly. What is the original meaning of the target field of the pack_request structure again? A packed_git structure that will be filled when we are done fetching the packfile from the other side and installed in our repository? When we are (ab)using http_fetch code to fetch a single packfile, we do not install the packfile into the running process, because we are only (re)using the existing machinery as a poor-man's "curl | git index-pack --stdin"? I do not think it is a bad idea to roll "curl | git index-pack --stdin" our own, but I do find this an ugly way to do so. Perhaps a set of lower-level helper functions can be isolated out of the existing code before this new feature is added, and then a new "just fetch and pipe it to the index-pack" feature should be written using these helpers but with a separate set of entry points? Would it be a good way to make the resulting code cleaner than this patch does? I dunno. > diff --git a/http.h b/http.h > index a5b082f3ae..709dfa4c19 100644 > --- a/http.h > +++ b/http.h > @@ -223,12 +223,21 @@ struct http_pack_request { > struct active_request_slot *slot; > > /* > - * After calling new_http_pack_request(), point lst to the head of the > + * After calling new_http_pack_request(), if fetching a pack that > + * http_get_info_packs() told us about, point lst to the head of the > * pack list that target is in. finish_http_pack_request() will remove > * target from lst and call install_packed_git() on target. > */ > struct packed_git **lst; > > + /* > + * If this is true, finish_http_pack_request() will pass "--keep" to > + * index-pack, resulting in the creation of a keep file, and will not > + * suppress its stdout (that is, the "keep\t<hash>\n" line will be > + * printed to stdout). > + */ > + unsigned generate_keep : 1; > + I suspect that this is a sign that this single patch is trying to do too many things at the same time. - Whether we are fetching a single packfile from a URL, or walking to fetch all the packfiles in the repository at a given URL - Whether packfiles taken from outer space are marked with the "keep" bit - Whether the obtained packfile(s) are internally "installed" to the running process are conceptually independent choices, but somehow mixed up, it seems. Thanks.