On Monday 28 May 2007, Junio C Hamano wrote: > Johan Herland <johan@xxxxxxxxxxx> writes: > > On Monday 28 May 2007, Linus Torvalds wrote: > >> On Mon, 28 May 2007, Johan Herland wrote: > >> > I still don't see what makes note objects inherently more expensive than > >> > commit objects. Except for the refs, of course, but we're getting rid > >> > of those (at least replacing them with a more efficient reverse mapping). > >> > >> It's exactly the refs that I worry about. > >> > >> Anything that needs to read in all notes at startup is going to be _slow_. > >> > >> In contrast, commits we read when (and only when) we need them. > > > > Ok. But the reverse mapping will help with this, won't it? > > We'll look up the interesting commits and find their associated > > note objects directly. > > The issue Linus brought up worries me, too. > > The "efficient reverse mapping" is still handwaving at this > stage. What it needs to do is an equivalent to your > implementation with "refs/notes/<a dir per commit>/<note>". The > "efficient" one might do a flat file that says "notee note" per > line sorted by notee, or it might use BDB or sqlite, but the > amount of the data and complexity of the look-up is really the > same. A handful notes per each commit in the history (I think > Linus's "Acked-by after the fact" example a very sensible thing > to want from this subsystem). > > I am not saying that it is impossible to make the set-up cost > for the "efficient lookup" almost zero, and to make it lazy and > on-demand. The concern above just adds one design constraints > to that "efficient reverse mapping" code yet to come. Ok, here's what I'm thinking so far on that reverse mapping: 1. Keep a file, ".git/reverse_tagmap_sorted" with one entry of the form "pointee pointer" per line. The file is sorted on "pointee", so we can easily do the radix-256-fan-out-followed-by-binary-search trick that Linus mentioned in another thread. This should hopefully make lookup fairly cheap. BTW, if there is a similar "pointee pointer"-type format already being used in git, I'd be happy to use that instead. I looked at the "peeled" format being used by packed-refs, but using that directly doesn't sound like a good idea, since the refname causes the entries to be of variable length, and the refnames are not interesting to me at all. 2. Keep another file, ".git/reverse_tagmap_unsorted" in front of (1). This file has exactly the same format, minus the sorting. It exists just to make insertion cheap. Once this file reaches a certain size (i.e. when trawling it on lookup becomes slightly painful), we shuffle the entries into the sorted file (this would happen automatically on insertion of an entry, and should _not_ have to be triggered by 'git-gc' etc.). Of course, if we think insertion directly into (1) will never be too expensive, we can drop (2) altogether. I don't know enough about packing to have a good idea on how to pack these reverse tagmaps, but Shawn's thoughts about keeping associated tags/notes and objects close together makes a lot of sense. I'm just not sure yet where these reverse tagmaps fit into the whole picture. Currently, AFAICS, the packed-refs file is never propagated into the packs, but stays separate for the lifetime of the repo, but then it seems we're designing these reverse tagmaps for managing a handful of notes per commit, i.e. to hold a couple of orders of magnitude more entries than the packed-refs file. Maybe each pack should keep the reverse tagmap for all the object->note relationships internal to that pack? Everything else (unpacked notes, and object->note relationships spanning packs) would be kept in (1). Of course, when repacking, we'd try to keep objects and their notes together as much as possible, to maximize the in-pack reverse tagmap, and minimize the number of entries left behind in (1). Have fun! ...Johan -- Johan Herland, <johan@xxxxxxxxxxx> www.herland.net - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html