Re: [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits

Johan Herland <johan@xxxxxxxxxxx> · Tue, 29 May 2007 11:23:54 +0200

On Tuesday 29 May 2007, Jeff King wrote:
> On Tue, May 29, 2007 at 09:06:29AM +0200, Johan Herland wrote:
> 
> > 1. Keep a file, ".git/reverse_tagmap_sorted" with one entry of the form
> > "pointee pointer" per line. The file is sorted on "pointee", so we can
> > easily do the radix-256-fan-out-followed-by-binary-search trick that
> > Linus mentioned in another thread. This should hopefully make lookup
> > fairly cheap. BTW, if there is a similar "pointee pointer"-type format
> > already being used in git, I'd be happy to use that instead. I looked
> 
> I did a similar thing (though rather than having "lines" at all, they
> were fixed-length pairs of binary sha1 hashes) to implement a negative
> delta cache (which turned out to be a stupid idea, but the
> implementation worked):
> 
> http://www.gelato.unsw.edu.au/archives/git/0606/23229.html

Cool. It won't give a big-Oh improvement, but it might give a worthwile
boost anyway. I guess it's the usual tradeoff between speed and
maintainability/transparency/readability.

> > 2. Keep another file, ".git/reverse_tagmap_unsorted" in front of (1).
> > This file has exactly the same format, minus the sorting. It exists just
> > to make insertion cheap. Once this file reaches a certain size (i.e.
> > when trawling it on lookup becomes slightly painful), we shuffle the
> > entries into the sorted file (this would happen automatically on
> > insertion of an entry, and should _not_ have to be triggered by 'git-gc'
> > etc.).
> 
> The implementation I mentioned above collects several "to be inserted"
> entries (in memory) and then merge sorts the two lists into a new file.
> So it was fast in terms of comparisons, but it involved writing O(n)
> entries, which is probably bad for creating a single note.
> 
> > Of course, if we think insertion directly into (1) will never be too
> > expensive, we can drop (2) altogether.
> 
> You can always find the right spot, memmove everything down one slot,
> and insert. But that means:
>   - the cost of insertion will be proportional to the number of items,
>     whereas using an unsorted journal you get to amortize that cost over
>     many insertions
>   - if you update in place, you have to lock the db for reading while
>     you are moving around. If you are always either appending to the
>     journal or merging into a new file, you can avoid this.

Yep, I'm thinking (2) will be worth it in the end. O(#notes) per note
insertion doesn't sound appealing at all, especially not if we're
expecting a handful of notes per commit.

Have fun!

...Johan

-- 
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html