Re: Git is not scalable with too many refs/*

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 14, 2011 at 10:02, Jeff King <peff@xxxxxxxx> wrote:
> I also think we can do something a little more lightweight. The user has
> already created and is maintaining a mapping in one direction via the
> notes. We just need the inverse mapping, which we can generate
> programatically. So it can be a straight cache, with the sha1 of the
> notes tree determining the cache validity (i.e., if the forward mapping
> in the notes tree changes, you regenerate the cache from scratch).
>
> We would want to store the cache in an on-disk format that could be
> searched easily. Possibly something like the packed-refs format would be
> sufficient, if we mmap'd and binary searched it. It would be dirt simple
> if we used an existing key/value store like gdbm or tokyocabinet, but we
> usually try to avoid extra dependencies.

Yea, not a bad idea. Use a series of SSTable like things, like Hadoop
uses. It doesn't need to be as complex as the Hadoop SSTable concept.
But a simple sorted string to string mapping file that is immutable,
with edits applied by creating an overlay file that contains
new/updated entries.

As you point out, we can use the notes tree to tell us the validity of
the cache, and do incremental updates. If the current cache doesn't
match the notes ref, compute the tree diff between the current cache's
source tree and the new tree, and create a new SSTable like thing that
has the relevant updates as an overlay of the existing tables. After
some time you will have many of these little overlay files, and a GC
can just merge them down to a single file.

The only problem is, you probably want this "reverse notes index" to
be indexing a portion of the note blob text, not all of it. That is,
we want the SVN note text to say something like "SVN Revision: r1828"
so `git log --notes=svn` shows us something more useful than just
"r1828". But in the reverse index, we may only want the key to be
"r1828". So you need some sort of small mapping function to decide
what to put into that reverse index.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]