A reroll is coming soon, but there is an interesting discussion point here so I'll reply to this e-mail first. On Thu, 15 Jun 2017 11:34:45 -0700 Junio C Hamano <gitster@xxxxxxxxx> wrote: > Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: > > > +struct missing_blob_manifest { > > + struct missing_blob_manifest *next; > > + const char *data; > > +}; > > +struct missing_blob_manifest *missing_blobs; > > +int missing_blobs_initialized; > > I do not think you meant to make these non-static. The type of the > former is not even visible to the outside world, and the latter is > something that could be made into static to prepare_missing_blobs() > function (unless and until you start allowing the missing-blobs > manifest to be re-initialized). Your ensure_configured() below > seems to do the "static" right, on the other hand ;-). Good catch - done. > Do we expect that we will have only a handful of these missing blob > manifests? Each manifest seems to be efficiently looked-up with a > binary search, but it makes me wonder if it is a good idea to > consolidate these manifests into a single list of object names to > eliminate the outer loop in has_missing_blob(). Unlike pack .idx > files that must stay one-to-one with .pack files, it appears to me > that there is no reason why we need to keep multiple ones separate > for extended period of time (e.g. whenever we learn that we receieved > an incomplete pack from the other side with a list of newly missing > blobs, we could incorporate that into existing missing blob list). There is indeed no reason why we need to keep multiple ones separate for an extended period of time - my thinking was to let fetch/clone be fast by not needing to scan through the entire existing manifest (in order to create the new one), letting GC take care of consolidating them (since it would have to check individual entries to delete those corresponding to objects that have entered the repo through other means). But this is at the expense of making the individual object lookups a bit slower. For now, I'll leave the possibility of multiple files open while I try to create a set of patches that can implement missing blob support from fetch to day-to-day usage. But I am not opposed to changing it to a single-file manifest. > > +int has_missing_blob(const unsigned char *sha1, unsigned long *size) > > +{ > > This function that answers "is it expected to be missing?" is > confusingly named. Is it missing, or does it exist? Renamed to in_missing_blob_manifest(). > > @@ -2981,11 +3050,55 @@ static int sha1_loose_object_info(const unsigned char *sha1, > > return (status < 0) ? status : 0; > > } > > > > +static char *missing_blob_command; > > +static int sha1_file_config(const char *conf_key, const char *value, void *cb) > > +{ > > + if (!strcmp(conf_key, "core.missingblobcommand")) { > > + missing_blob_command = xstrdup(value); > > + } > > + return 0; > > +} > > + > > +static int configured; > > +static void ensure_configured(void) > > +{ > > + if (configured) > > + return; > > Do not be selfish and pretend that this is the _only_ kind of > configuration that needs to be done inside sha1_file.c. Call the > function ensure_<something>_is_configured() and rename the run-once > guard to match. My thinking was that any additional configuration could be added to this function, but individual configuration for each feature is fine too. I have renamed things according to your suggestion. > The run-once guard can be made static to the "ensure" function, and > if you do so, then its name can stay to be "configured", as at that > point it is clear what it is guarding. Done. > > +pack() { > > Style: "pack () {" Done. > > > + perl -e '$/ = undef; $input = <>; print pack("H*", $input)' > > high-nybble first to match ntohll() done in has_missing_blob()? OK. Actually it's to match the printf behavior below that prints the high nybble first (like in English).