Re: Quickly searching for a note

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 21 Sep 2012 21:51:12 -0700

Joshua Jensen <jjensen@xxxxxxxxxxxxxxxxx> writes:

> Background: To tie Perforce changelists to Git commits, I add a note
> to a commit with the form "P4@123456".  Later, I use the note to sync
> down the closest Perforce changelist matching the Git commit.

I noticed that nobody brought this up, but probably it should not be
left unsaid, so...

For annotating commits with additional pieces of data, notes is a
reasonable mechanism, but the user should be aware that it is
heavily geared towards one-way mapping. When you have a commit and
want to know something about it, it will give you the associated
information reasonably efficiently.

But it is not a good mechanism for retrieval if the primary way you
use the stored information is to go from the associated information
to find the commit that has that note attached to it.  Your usage
pattern that triggered this thread may fall into that category.

It may still be a reasonable mechanism to use notes to exchange the
information across repositories, but if your application relies
heavily on mapping the information in the opposite way, you may want
to maintain a local cache of the reverse mapping in a more efficient
fashion.  For example, every time your notes tree is updated, you
can loop over "git notes list" output and register the contents of
the blob object that annotates each commit as the key and the commit
object name as the value to a repository-local sqlite database or
something (and depending on the nature of the frequent query, have
efficient index on the key).

Having mentioned an external database as the most generic approach,
I suspect that one important way to use notes is to associate
commits with some other (presumably unique) ID to interface with the
external world.  For example, I maintain "amlog" notes to record the
original message-ID for each commit that resulted from "git am".
The primary use of this is to find the message-ID for a commit that
was made some time ago and later found to be questionable, so that I
can find the relevant discussion thread, but the information could
be used to see if a given message I see in the mail archive has been
already applied, and this needs a fast reverse mapping.

It actually is fairly trivial to maintain both forward and reverse
mapping for this kind of use case.  For example, your gateway that
syncs from Perforce may currently be doing something like this at
the end of it:

    git notes --ref p4notes add -m "P4@$p4_change_id" HEAD

to give a quick mapping the commit object name of the resulting
commit (in HEAD) to "P4@123456".

This is stored as a mapping from the object name of HEAD to the
object name of a blob whose contents is "P4@123456"  You can see it
in action with

    $ git notes --ref p4notes list HEAD

that gives the blob object name that stores the note for the HEAD.

Now, there is _no_ reason why you cannot attach notes to these blob
objects.  For example, your "Perforce to Git" gateway can end with
something like this instead:

    HEAD=$(git rev-parse --verify HEAD)
    git notes --ref p4notes add -m "P4@$p4_change_id" $HEAD
    noteblob=$(git notes --ref p4notes list $HEAD)
    git notes --ref p4notes add -m "$HEAD" $noteblob

Then when you want to map P4@123456 to Git commit, you could

    $ noteblob=$(echo P4@123456 | git hash-object --stdin)
    $ git notes --ref p4notes show $noteblob

to see the commit object name that is associated with that notes.
Of course, the same notes tree holds the forward mapping as before,
so 

    $ git notes --ref p4notes show HEAD

will give you the "P4@123456".

We may want to support such a reverse mapping natively so that
"notes rewrite" logic maintains the mapping in both direction.

I've CC'ed people who may want to be involved in further design
work.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html