On Tuesday 29 May 2007, Jeff King wrote: > On Tue, May 29, 2007 at 09:06:29AM +0200, Johan Herland wrote: > > > 1. Keep a file, ".git/reverse_tagmap_sorted" with one entry of the form > > "pointee pointer" per line. The file is sorted on "pointee", so we can > > easily do the radix-256-fan-out-followed-by-binary-search trick that > > Linus mentioned in another thread. This should hopefully make lookup > > fairly cheap. BTW, if there is a similar "pointee pointer"-type format > > already being used in git, I'd be happy to use that instead. I looked > > I did a similar thing (though rather than having "lines" at all, they > were fixed-length pairs of binary sha1 hashes) to implement a negative > delta cache (which turned out to be a stupid idea, but the > implementation worked): > > http://www.gelato.unsw.edu.au/archives/git/0606/23229.html Cool. It won't give a big-Oh improvement, but it might give a worthwile boost anyway. I guess it's the usual tradeoff between speed and maintainability/transparency/readability. > > 2. Keep another file, ".git/reverse_tagmap_unsorted" in front of (1). > > This file has exactly the same format, minus the sorting. It exists just > > to make insertion cheap. Once this file reaches a certain size (i.e. > > when trawling it on lookup becomes slightly painful), we shuffle the > > entries into the sorted file (this would happen automatically on > > insertion of an entry, and should _not_ have to be triggered by 'git-gc' > > etc.). > > The implementation I mentioned above collects several "to be inserted" > entries (in memory) and then merge sorts the two lists into a new file. > So it was fast in terms of comparisons, but it involved writing O(n) > entries, which is probably bad for creating a single note. > > > Of course, if we think insertion directly into (1) will never be too > > expensive, we can drop (2) altogether. > > You can always find the right spot, memmove everything down one slot, > and insert. But that means: > - the cost of insertion will be proportional to the number of items, > whereas using an unsorted journal you get to amortize that cost over > many insertions > - if you update in place, you have to lock the db for reading while > you are moving around. If you are always either appending to the > journal or merging into a new file, you can avoid this. Yep, I'm thinking (2) will be worth it in the end. O(#notes) per note insertion doesn't sound appealing at all, especially not if we're expecting a handful of notes per commit. Have fun! ...Johan -- Johan Herland, <johan@xxxxxxxxxxx> www.herland.net - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html