Re: [PATCH 1/3] pack-objects: fix tree_depth and layer invariants

Jeff King <peff@xxxxxxxx> · Thu, 22 Nov 2018 11:50:12 -0500

On Tue, Nov 20, 2018 at 05:37:18PM +0100, Duy Nguyen wrote:

> > But in (b), we use the number of stored objects, _not_ the allocated
> > size of the objects array. So we can run into a situation like this:
> >
> >   1. packlist_alloc() needs to store the Nth object, so it grows the
> >      objects array to M, where M > N.
> >
> >   2. oe_set_tree_depth() wants to store a depth, so it allocates an
> >      array of length N. Now we've violated our invariant.
> >
> >   3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
> >      grow the objects array, since N <= M still holds. We try to assign
> >      to tree_depth[N+1], which is out of bounds.
> 
> Do you think if this splitting data to packing_data is too fragile
> that we should just scrape the whole thing and move all data back to
> object_entry[]? We would use more memory of course but higher memory
> usage is still better than more bugs (if these are likely to show up
> again).

Certainly that thought crossed my mind while working on these patches. :)

Especially given the difficulties it introduced into the recent
bitmap-reuse topic, and the size fixes we had to deal with in v2.19.

Overall, though, I dunno. This fix, while subtle, turned out not to be
too complicated. And the memory savings are real. I consider 100M
objects to be on the large size of feasible for stock Git these days,
and I think we are talking about on the order of 4GB memory savings
there. You need a big machine to handle a repository of that size, but
4GB is still appreciable.

So I guess at this point, with all (known) bugs fixed, we should stick
with it for now. If it becomes a problem for development of a future
feature, then we can re-evaluate then.

-Peff