> > This makes sense - offsets may be different when we omit objects from > > the packfile. I think this can be computed by calculating the number of > > zero bits between the current object's index and the nth object prior > > (where n is the offset) in the bitmap resulting from > > reuse_partial_packfile_from_bitmap() above, thus eliminating the need > > for this array, but I haven't tested it. > > You need to know not just the number of zero bits, but the accumulated > offset due to those missing objects. So you'd end up having to walk over > the revindex for that set of objects. This array is basically caching > those accumulated offsets (for the parts we _do_ include) so we don't > have to compute them repeatedly. Ah...yes. For some reason I thought that the offset was a number of objects, but it is actually a number of bytes. The patch makes sense now. > There's also a more subtle issue with entry sizes; see below. Good point. > > > @@ -1002,6 +1132,10 @@ static int have_duplicate_entry(const struct object_id *oid, > > > { > > > struct object_entry *entry; > > > > > > + if (reuse_packfile_bitmap && > > > + bitmap_walk_contains(bitmap_git, reuse_packfile_bitmap, oid)) > > > + return 1; > > > > Hmm...why did we previously not need to check the reuse information, but > > we do now? I gave the code a cursory glance but couldn't find the > > answer. > > I think the original code may simply have been buggy and nobody noticed. > Here's what I wrote when this line was added in our fork: [snip explanation] Thanks - I'll also take a look if I have time. > Thanks for looking at it. I still have to take a careful pass over the > whole split, but I've tried to at least answer your questions in the > meantime. Thanks for your responses. Also thanks to Christian for splitting it in the first place, making it easier to review.