Christian Couder <christian.couder@xxxxxxxxx> writes: > +/* > + * Record the offsets needed in our reused packfile chunks due to > + * "gaps" where we omitted some objects. > + */ The meaning of 'start' and 'offset' is unclear from the first reading. Is it "starting offset" and "for how many bytes the region lasts"? If so, 'offset', which is usually a location (unless you always measure from the beginning, in which case you could say it names the byte-size of a region), may be a misnomer (side note: I'll pretend that I haven't realized the 'offset' may be the end offset of a region for now---this is a good illustration why a better comment should be here anyway ;-). > +static struct reused_chunk { > + off_t start; > + off_t offset; > +} *reused_chunks; > +static int reused_chunks_nr; > +static int reused_chunks_alloc; > + > +static void record_reused_object(off_t where, off_t offset) And here, 'start' is called 'where'; either is a good word for a location; we'd want to pick either one to be consistent, perhaps? > { > - unsigned char buffer[8192]; > - off_t to_write, total; > - int fd; > + if (reused_chunks_nr && reused_chunks[reused_chunks_nr-1].offset == offset) > + return; The reason why the above code works is because this function will always be called in the ascending order of the 'offset'? Hmmm, perhaps 'offset' is not a region-size after all. Is it the end offset, as opposed to 'start' which is the starting offset, and the two offsets sandwitch a region? > - if (!is_pack_valid(reuse_packfile)) > - die(_("packfile is invalid: %s"), reuse_packfile->pack_name); > + ALLOC_GROW(reused_chunks, reused_chunks_nr + 1, > + reused_chunks_alloc); > + reused_chunks[reused_chunks_nr].start = where; > + reused_chunks[reused_chunks_nr].offset = offset; > + reused_chunks_nr++; > +} > > - fd = git_open(reuse_packfile->pack_name); > - if (fd < 0) > - die_errno(_("unable to open packfile for reuse: %s"), > - reuse_packfile->pack_name); > +/* > + * Binary search to find the chunk that "where" is in. Note > + * that we're not looking for an exact match, just the first > + * chunk that contains it (which implicitly ends at the start > + * of the next chunk. > + */ > +static off_t find_reused_offset(off_t where) > +{ > + int lo = 0, hi = reused_chunks_nr; > + while (lo < hi) { > + int mi = lo + ((hi - lo) / 2); > + if (where == reused_chunks[mi].start) > + return reused_chunks[mi].offset; > + if (where < reused_chunks[mi].start) > + hi = mi; > + else > + lo = mi + 1; > + } > > - if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1) > - die_errno(_("unable to seek in reused packfile")); > + /* > + * The first chunk starts at zero, so we can't have gone below > + * there. > + */ > + assert(lo); > + return reused_chunks[lo-1].offset; > +} This comment has nothing to do with the change, but the way the patch is presented is quite hard to follow, in that the preimage or the common context lines do not help understand what the new code is doing at all ;-) I'll come back to the remainder of the patch later. Thanks.