Re: RFC: Flat directory for notes, or fan-out? Both!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes:

> Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote:
>> On Tue, 10 Feb 2009, Junio C Hamano wrote:
>> > 
>> > I could do a revert on 'master' if it is really needed, but I found that
>> > the above reasoning is a bit troublesome.  The thing is, if a tree to hold
>> > the notes would be huge to be unmanageable, then it would still be huge to
>> > be unmanageable if you split it into 256 pieces.
>> 
>> The thing is, a tree object of 17 megabyte is unmanagably large if you 
>> have to read it whenever you access even a single node.  Having 256 trees 
>> instead, each of which is about 68 kilobyte is much nicer.
>
> See my other email on this thread; we'd probably need to unpack
> all 256 subtrees *anyway* due to the distribution of SHA-1 names
> for commits.

I wonder if we can solve this by introducing a local cache that is a flat
file that looks like:

    magic number for /usr/bin/file
    tree object SHA-1 the file caches
    Number of entries in this file
    256 fan-out offsets into this file
    N entries of <SHA-1, SHA-1>, sorted
    Checksum of the file itself

and use it when availble (otherwise optionally create it upon the first
lookup).  The file can be used by mmaping it and then doing a newton
raphson or binary search similar to the way patch-ids.c does.

The top-level API for such a hash-map would perhaps look like:

    /*
     * take the object name a tree object that is a hash map,
     * return an opaque struct.
     */
    struct hashmap *hashmap_open(const unsigned char *);

    /*
     * find the value given the key and return 0, or return negative
     * if not found.
     */
    int hashmap_lookup(struct hashmap *map, const unsigned char *key,
    		       unsigned char *val);

    /* discard the thing */
    void hashmap_close(struct hashmap *map);

We should be able to use these in "git log" and friends where Dscho added
the hook in his git-notes topic.

I am hoping that I could eventually rewrite rerere to use something like
this, so that rerere database can be shared, just like the way notes can
be shared, across repositories.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux