Re: [PATCH 2/2] log: do not shorten decoration names too early

Jeff King <peff@xxxxxxxx> · Thu, 14 May 2015 13:49:45 -0400

On Thu, May 14, 2015 at 10:37:39AM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > While we are on the subject of the name_decoration code, I had
> > considered at one point replacing the use of the decorate.[ch] hash
> > table with a commit_slab (you can't do it in the general case, because
> > decorate.[ch] handles arbitrary objects, but the name_decoration code
> > only does commits). It would in theory be faster, though I don't know if
> > the time we spend on the hash table is actually measurable (we make a
> > lot of queries on it, but it doesn't actually get that big in the first
> > place).
> 
> Hmmm, but I do not know if commit_slab is a good fit for the usage
> pattern.  I expect commit objects to be fairly sparsely decorated
> (e.g. a tag or ref pointing at say 1-2% of commits or fewer).
> Wouldn't the hashtable used by decorate.[ch] with the max load
> factor capped to 66% a better economy?

Good point. A slab would be less memory efficient, but cheaper to look
up (it is a direct memory reference, with no probing and no hashcmp()).
But cache effects matter, so it could even be slower.

On the other hand, the slab makes it easy to actually store the real
type (struct name_decoration), whereas the decorate hash stores only
pointers. So we'd save an extra malloc/pointer in each case.

So with your slab_peek() below, I'd guess that the slab would actually
be faster. But I'd also be unsurprised if it makes no appreciable
difference to the overall runtime of "git log --decorate". I think we'd
have to build it and profile (and please feel free to say "eh, not worth
the time to think about further").

> I notice that there is no API into commit_slab to ask "Does this
> commit have data in the slab?"  *slabname##_at() may be the closest
> thing, but that would allocate the space and then says "This is the
> slot for that commit; go check if there is data there already."

Yes. I think it's not a big deal if your slab is reasonably full (you'll
extend it to the full size of your commit space eventually either way).
But for a sparse slab, it does make that query much more expensive than
it needs to be.

> Perhaps we want a new function *slabname##_peek() with the same
> signature as *slabname##_at() that returns NULL when commit->index
> is larger than the last existing element in the slab?  Then the
> above would become more like:
> 
>     const void *get_cached_commit_buffer(const struct commit *commit, unsigned long *sizep)
>     {
>             struct commit_buffer *v = buffer_slab_peek(&buffer_slab, commit);
>             if (!v)
>                     return NULL;
>             if (sizep)
>                     *sizep = v->size;
>             return v->buffer;
>     }

Yeah, I agree that is a good interface. It is basically a read-only
version of slab_at(). And it should be similarly agnostic as to the type
we are storing, because the pointer is into the slab array.

I'm not sure that the memory overhead is that big a deal (even in the
kernel, we are only talking about a few megabytes). But it is wasteful,
and the interface above should be trivial to write, so it might be worth
doing anyway.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html