Re: [PATCH 2/2] log: do not shorten decoration names too early

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 14 May 2015 10:37:39 -0700

Jeff King <peff@xxxxxxxx> writes:

> While we are on the subject of the name_decoration code, I had
> considered at one point replacing the use of the decorate.[ch] hash
> table with a commit_slab (you can't do it in the general case, because
> decorate.[ch] handles arbitrary objects, but the name_decoration code
> only does commits). It would in theory be faster, though I don't know if
> the time we spend on the hash table is actually measurable (we make a
> lot of queries on it, but it doesn't actually get that big in the first
> place).

Hmmm, but I do not know if commit_slab is a good fit for the usage
pattern.  I expect commit objects to be fairly sparsely decorated
(e.g. a tag or ref pointing at say 1-2% of commits or fewer).
Wouldn't the hashtable used by decorate.[ch] with the max load
factor capped to 66% a better economy?

I notice that there is no API into commit_slab to ask "Does this
commit have data in the slab?"  *slabname##_at() may be the closest
thing, but that would allocate the space and then says "This is the
slot for that commit; go check if there is data there already."

In the context of using commit_slab in log-tree.c for decoration, it
would mean that we assign low slab indices to commits at the tips by
first calling "for_each_ref(add_ref_decoration)" and populate the
slab fairly densely at the beginning.  But when we check if a commit
that we encountered during a traversal is decorated or not, we would
ask *slabname##_at() and that ends up enlarging the slab, even at
that point the only thing we are interested in is if the commit is
decorated and we are not adding a new decoration for it.

For example, we have this in commit.c:

    const void *get_cached_commit_buffer(const struct commit *commit, unsigned long *sizep)
    {
            struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
            if (sizep)
                    *sizep = v->size;
            return v->buffer;
    }

But if we do not have the "buffer" data cached for that commit (via
an earlier call to set_commit_buffer()), we don't have to enlarge
the slab, as we are not adding anything to the slab system with this
call.

Perhaps we want a new function *slabname##_peek() with the same
signature as *slabname##_at() that returns NULL when commit->index
is larger than the last existing element in the slab?  Then the
above would become more like:

    const void *get_cached_commit_buffer(const struct commit *commit, unsigned long *sizep)
    {
            struct commit_buffer *v = buffer_slab_peek(&buffer_slab, commit);
            if (!v)
                    return NULL;
            if (sizep)
                    *sizep = v->size;
            return v->buffer;
    }

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html