Re: How it was at GitTogether'08 ?

Jeff King <peff@xxxxxxxx> · Sat, 8 Nov 2008 09:17:27 -0500

On Sat, Nov 08, 2008 at 04:41:25AM +0100, Johan Herland wrote:

> > * Discussion on notes
> 
> Can someone elaborate on this? AFAIK, notes have popped up on this list 
> often enough that I'm convinced it would be a _really_ useful feature. The 
> only drawback I was aware of, was the lack of an efficient implementation, 
> but then Jeff comes out of the blue and posts some interesting numbers [1] 
> a week or so ago. Does this mean there are no remaining obstacles?
> 
> [1]: http://article.gmane.org/gmane.comp.version-control.git/99415

The discussion was along the lines of "here are some more cool things we
could do, if we had notes." I don't remember the specifics of the cool
things, but they were related to annotating patches with review
information. Shawn can probably elaborate more.

That led to a "notes as a tree are nice, but too slow because looking up
a tree entry is linear" (and obviously you do a ton of lookups in the
notes tree during "git log"). Dscho had posted an implementation with a
persistent notes cache long ago. Since I failed to actually look at
that, I started on a slightly different approach, which is simply doing
an in-memory hash table to speedup the notes tree. And those are the
numbers and patch I posted.

My eventual plan was to re-work Dscho's patches with this performance
approach. But it is not at the top of my queue, so if somebody else
wanted to pick it up, I would be very happy. Everything I have done so
far is in the post you referenced.

The only other thing I remember discussing was notes namespaces. The two
obvious approaches are:

 1. a separate ref for each notes namespace, with each note ending up a
    blob in a tree. So you might have refs/notes/acked-by:$SHA1 as a
    blob.

 2. one notes ref, with the notes tree pointing a sub-tree that has
    named entries, one for each note type. So you might have
    refs/notes:$SHA1/acked-by as a blob.

The advantage of '1' is that it keeps your different note types
separate, which means it is easy to distribute one type but not the
other. The advantage of '2' is that I do one lookup per-commit, and then
I can see all of the notes, which keeps performance nice when you want
to annotate with several note types.

After some discussion, I think Dscho and I came to the conclusion that
supporting both might be desirable. And it should be pretty
straightforward. You can just have multiple note refs (but default to a
"main" one), and within each one, either point to a tree or blob (and we
will see which and use it appropriately).

And then depending on which notes the user wants, they can refer to them
appropriately. My suggestion for naming (and this wasn't discussed
earlier, so Dscho has not endorsed this) would be something like
"$X:$Y", which would mean "to get the notes for $SHA1, look at the tree
in refs/notes/$X for the file $SHA1/$Y". If $Y is empty, then expect
$SHA1 to be a blob (if it's a tree, maybe look at $SHA1/default). If
"$X" is empty, then use "refs/notes/default". If there is no colon,
assume we have "$Y".

So you could have a bunch of notes in some "main" namespace just by
calling them some name; without a name, you get some "default" note. But
if you wanted a separate database (say, for SVN information), you could
use "svn:" or "svn:name".

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html