Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util

Jeff King <peff@xxxxxxxx> · Sat, 12 May 2018 15:12:05 -0400

On Sat, May 12, 2018 at 02:09:18PM +0200, Duy Nguyen wrote:

> > That wastes an extra 4 bytes per slot over storing an int directly, but
> > it's the same as storing an 8-byte pointer, plus you avoid the storage
> > and runtime overhead of malloc.
> 
> Or we could have a way to let the user decide initial values. If the
> initial value here is -1 (which can't possibly be used in the current
> code), it could be the sentinel value.

That's actually a little tricky. Right now the sentinels are assigned by
xcalloc(), so they're all-bits zero. We _could_ walk any newly allocated
slab and assign a value to each element, but I'd worry that is wasteful
for cases where might only use a small part of the slab (and my
assumption is that the OS can give us zero'd pages pretty cheaply, and
maybe even avoid allocating a page until we touch them, but I don't know
how true that is).

> Did you notice the for loop in the end to free "int *"? I don't like
> peeking inside a slab that way and would prefer passing a "free"
> function pointer to clear_commit_depth(), or just assign a "free"
> function to some new field in struct commit_depth and
> clear_commit_depth() will call it. If we have a new field for "free"
> callback in the struct, it makes sense to have an "init" callback to
> do extra initialization on top of xcalloc.

You might find that a free() is tricky with the type system. This is all
implemented with macros, so you'd end up calling free() on an int (even
if it's a function that nobody ever calls). I suppose the value could be
cast to void to shut up the compiler, but then that function is like a
ticking time bomb. ;)

So I guess you'd need a variant of define_commit_slab() that defines the
clear function and one that doesn't.

> > I actually wonder if we could wrap commit_slab with a variant that
> > stores the sentinel data itself, to make this pattern easier. I.e.,
> > something like:
> >
> >   #define define_commit_slab_sentinel(name, type) \
> >         struct name##_value { \
> >                 unsigned valid:1; \
> >                 type value; \
> >         }; \
> >         define_commit_slab(name, struct name##_value)
> >
> > and some matching "peek" and "at" functions to manipulate value
> > directly.
> 
> If you keep this valid bit in a separate slab, you can pack these bits
> very tight and not worry about wasting memory. Lookup in commit-slab
> is cheap enough that doing double lookups (one for valid field, one
> for value) is not a problem, I think.

True, though your cache behavior gets worse if your have two separate
arrays. I'm not sure pinching bytes matters all that much. linux.git has
~600k commits, so even there a few bytes each is not so bad (although
again, we get into cache issues).

> Yeah I think I will stick with a faithful conversion for now. The
> conversion shows room for improvement which could be the next
> microproject (I thought of adding this removing 'util' task as a 2019
> microproject but it was too tricky for newcomers to do).

Makes sense to me.

-Peff