Re: Commit cache to speed up rev-list and merge

Shawn Pearce <spearce@xxxxxxxxxxx> · Thu, 27 Sep 2012 10:45:32 -0700

On Thu, Sep 27, 2012 at 10:39 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Thu, Sep 27, 2012 at 08:51:51AM -0700, Shawn O. Pearce wrote:
>> On Thu, Sep 27, 2012 at 5:17 AM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
>> > I'd like to see some sort of extension mechanism like in
>> > $GIT_DIR/index, so that we don't have to increase pack index version
>> > often. What I have in mind is optional commit cache to speed up
>> > rev-list and merge, which could be stored in pack index too.
>>
>> Can you share some of your ideas?
>
> Some of it is here:
>
>   http://article.gmane.org/gmane.comp.version-control.git/203308

Quoting from that patch:

On  2012-08-12 Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
> Long term we might gain slight lookup speedup if we know object type
> as search region is made smaller. But for that to happen, we need to
> propagate object type hint down to find_pack_entry_one() and friends.
> Possible thing to do, I think.

I'm not sure reclustering the index by object type is going to make a
worthwhile difference. Of 2.2m objects in the Linux tree, 320k are
commits. The difference between doing the binary search through all
objects vs. just commits is only 2 iterations more of binary search if
we assume the per-type ranges have their own fan-out tables.

> The main reason to group objects by type is to make it possible to
> create another sha1->something mapping for a particular object type,
> without wasting space for storing sha-1 keys again. For example, we
> can store commit caches, tree caches... at the end of the index as
> extensions.

Using ordinal position in the pack also works, and doesn't require
clustering objects by type.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html