[RFD] annnotating a pair of commit objects?

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 02 Jan 2013 23:03:00 -0800

I'd like a datastore that maps a pair of commit object names to
another object name, such that:

 * When looking at two commits A and B, efficiently query all data
   associated with a pair of commits <X,Y> where X is contained in
   the range A..B and not in B..A, and Y is contained in the range
   B..A and not in A..B.

 * When <X,Y> is registered in the datastore, and X is rewritten to
   X' and/or Y is rewritten to Y', the mapping is updated so that it
   can be queried with <X',Y'> as a new key, similar to the way a
   notes tree that maps object X can be updated to map object X'
   when such a rewrite happens.

The intended use case is to "go beyond rerere".  Given a history of
this shape:

    o---o---o---I      mainline
   / 
  O---o---X---o---A    topic A
   \
    o---Y---o---o---B  topic B

Suppose in the original O we had a function "distimmed_doshes()" to
tell if doshes are already distimmed, with some call sites.  On the
branch leading to A, at commit X, this function was renamed to
"doshes_are_distimmed()", and all existing call sites were adjusted.
On the side branch leading to B, however, at commit Y, a new call
site to the old function was added in a file that was not touched
between O..A at all.

When merging either the topic A or the topic B (but not both) to the
integration branch that did not touch this function or use of it, no
special care needs to be taken, but when merging the second topic
after merging the other one, we need to resolve a semantic conflict.
Namely, the callsite to "distimmed_doshes()" introduced by commit Y
needs to be adjusted to call "doshes_are_distimmed()" instead.

The first step is to recognize the potential issue.  When queuing
the topic that contains X and the other topic that contains Y,
suppose I could register <X,Y> to the datastore I am dreaming.  When
I merge A to the integration branch, I can notice that there is no
such pair <M,N> in the datastore that:

 * M is in A..I and not in I..A
 * N is in I..A and not in A..I

and can create a merge J without semantic adjustment.

    o---o---o---I---J      mainline
   /               /  
  O---o---X---o---A        topic A
   \
    o---Y---o---o---B      topic B

When I later merge topic B to the integration branch, however, there
is <X,Y> in the datastore such that:

 * X is in B..J and not in J..B
 * Y is in J..B and not in B..J

to notice that we need to be careful when creating the merge K:

    o---o---o---I---J---K  mainline
   /               /   /
  O---o---X---o---A   /    topic A
   \                 /
    o---Y---o---o---B      topic B

Of course, the next step is to store not just one bit "<X,Y> exists
in the datastore--be careful", but what semantic adjustment needs to
be applied [*1*]

Obviously, with O(cnt(A..B))*O(cnt(B..A)) complexity, this can be
trivially coded, by trying all pairs in symmetric difference.

But I am hoping we can do better than that.

Ideas?

[Footnote]

*1* We could do it in multiple ways and the details are not
interesting. A blob object that records a patch that can be applied
with "git apply -3", or a commit object that represents necessary
"evil" change that can be cherry-pick'ed, would be two possible
implementations.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html