Re: [RFC PATCH 10/10] pack-objects: improve partial packfile reuse

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Fri, 11 Oct 2019 14:04:34 -0700

> > This makes sense - offsets may be different when we omit objects from
> > the packfile. I think this can be computed by calculating the number of
> > zero bits between the current object's index and the nth object prior
> > (where n is the offset) in the bitmap resulting from
> > reuse_partial_packfile_from_bitmap() above, thus eliminating the need
> > for this array, but I haven't tested it.
> 
> You need to know not just the number of zero bits, but the accumulated
> offset due to those missing objects. So you'd end up having to walk over
> the revindex for that set of objects. This array is basically caching
> those accumulated offsets (for the parts we _do_ include) so we don't
> have to compute them repeatedly.

Ah...yes. For some reason I thought that the offset was a number of
objects, but it is actually a number of bytes. The patch makes sense
now.

> There's also a more subtle issue with entry sizes; see below.

Good point.

> > > @@ -1002,6 +1132,10 @@ static int have_duplicate_entry(const struct object_id *oid,
> > >  {
> > >  	struct object_entry *entry;
> > >  
> > > +	if (reuse_packfile_bitmap &&
> > > +	    bitmap_walk_contains(bitmap_git, reuse_packfile_bitmap, oid))
> > > +		return 1;
> > 
> > Hmm...why did we previously not need to check the reuse information, but
> > we do now? I gave the code a cursory glance but couldn't find the
> > answer.
> 
> I think the original code may simply have been buggy and nobody noticed.
> Here's what I wrote when this line was added in our fork:

[snip explanation]

Thanks - I'll also take a look if I have time.

> Thanks for looking at it. I still have to take a careful pass over the
> whole split, but I've tried to at least answer your questions in the
> meantime.

Thanks for your responses. Also thanks to Christian for splitting it in
the first place, making it easier to review.