Re: [PATCH] fast-export: avoid NULL pointer arithmetic

Jeff King <peff@xxxxxxxx> · Fri, 11 May 2018 04:56:34 -0400

On Fri, May 11, 2018 at 08:19:59AM +0200, Duy Nguyen wrote:

> On Fri, May 11, 2018 at 6:49 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> > Junio C Hamano <gitster@xxxxxxxxx> writes:
> >
> >> René Scharfe <l.s.r@xxxxxx> writes:
> >>
> >>>> But it somehow feels backwards in spirit to me, as the reason why we
> >>>> use "void *" there in the decoration field is because we expect that
> >>>> we'd have a pointer to some struture most of the time, and we have
> >>>> to occasionally store a small integer there.
> >>>
> >>> Yes, fast-export seems to be the only place that stores an integer as
> >>> a decoration.
> >>
> >> With the decoration subsystem that might be the case, but I think
> >> we have other codepaths where "void * .util" field in the structure
> >> is used to store (void *)1, expecting that a normal allocation will
> >> never yield a pointer that is indistinguishable from that value.
> >
> > I was looking at a different topic and noticed that bisect.c uses
> > commit->util (which is of type "void *") to always store an int (it
> > never stores a pointer and there is no mixing).  This one is equally
> > unportable as fast-export after your fix.
> >
> 
> In both cases we should be able to use commit-slab instead of
> commit->util. We could go even further and kill "util" pointer but
> that's more work.

I would love it if we could kill the util pointer in favor of
commit-slab. Unfortunately the fast-export case is decorating non-commit
objects, too.

I'm not sure how possible/easy it would be to have an "object slab".
Some complications are:

  1. It would increase the size of "struct tree" and "struct blob" by at
     least 4 bytes (possibly 8, due to alignment) to store the slab
     index. Commits would stay the same, but my experience is that most
     repositories have 5-10 times as many trees/blobs as commits.

     Some of that memory might be reclaimable. E.g., if we moved
     tree->buffer and tree->size into a slab, callers which don't use
     them would actually come out ahead (and more callers than you might
     expect could do this, since many only need to look at the tree
     contents inside a single function).

  2. If we allocate the indices for all types as a single sequence, then
     callers who only care about a specific type will have very sparse
     slabs (so they'll over-allocate, and it will have poor cache
     behavior).

     It might be possible to do something more clever. E.g., give each
     object type its own index counter, but then for a full object-slab,
     store per-type slabs and dereference obj->type to know which slab
     to look in.

     There'd be some complication with any_object. We'd probably need an
     OBJ_NONE slab, and then to allocate a type-specific index when the
     type is assigned. There's already a central point for this in
     object_as_type(). But we'd also have to migrate any
     previously-stored slab data (stored when it was OBJ_NONE), which
     implies having a global list of slabs.

So I dunno. It seems do-able but complicated.

-Peff