Re: Premerging topics (was: [RFD] annnotating a pair of commit objects?)

Johan Herland <johan@xxxxxxxxxxx> · Tue, 23 Apr 2013 08:34:42 +0200

On Wed, Apr 10, 2013 at 10:35 PM, Antoine Pelisse <apelisse@xxxxxxxxx> wrote:
> The goal is to propose a structure for storing and pre-merging pairs of commits.
>
> Data-structure
> ==============
>
> We could use a note ref to store the pre-merge information. Each commit
> would be annotated with a blob containing the list of pre-merges (one
> sha1 per line with sha1 pointing to a merge commit). The commit on the
> other side of a merge would also be annotated.
> The choice of the refname could be done like we do with notes:
> - Have a default value
> - Have a default value configured in config
> - Use a specific value when merging/creating the pre-merges
>
> Here are my concerns:
>
> Pros
> ----
> 1. Notes allow dynamic annotation a commit
> 2. If we manage to fix 4, we can easily download all pre-merges from a
> remote host by fetching the ref (or clean by deleting the ref).
> 3. Conflicts on pre-merge notes could probably be resolved by concatenation.
>
> Cons
> ----
> 4. Checking connectivity means opening the blob and parsing it

Can you solve this problem with a tree object, instead of inventing a
specially-formatted blob?

I.e. given pre-merge info P for a merge between commits A and B: A is
annotated by a tree object that contains all pre-merges where A is
involved. Each entry in the tree object has a filename and a blob
SHA1; we store other commit involved in this pre-merge (B) in the
filename, and the pre-merge data (P) in the blob SHA1.

Conversely, B is annotated with a tree object containing all pre-merge
entries concerning B, and within there, is an entry called A, pointing
to the same P.

You still need "special handling" in that you have to know that
"refs/notes/pre-merge" (or whatever it's called) stores tree objects
instead of blobs, and how to interpret the contents of those trees,
but you'll need such knowledge in any case.

A nice side-effect of using tree objects to store pre-merge entries,
is that you can do a tree-level merge of them, and it'll do the Right
Thing (assuming two-way merge with no common history), i.e. perform a
union merge, and leave you to handle conflicts of individual
pre-merges (i.e. you'll only get conflicts when both sides offer
different pre-merge data for A + B).

> 5. Regular notes and pre-merge notes have to be handled separately
> because of 4.

Sort of, but you get that for any automated usage of notes for a
specific purpose. Just look at the notes-cache mechanism in
notes-cache.{h,c}. That's another example of functionality layered on
top of notes that makes assumptions on how its notes trees are
structured.

> I'm hoping we can keep the pros and avoid the cons, but I'm kind of
> stuck here. Help would be really appreciated (or maybe this is a totally
> wrong direction, and I would also like to know ;)
>
> Merging (Using what we saved)
> =============================
> The goal is to merge branches J and B using existing pre-merges.
>
> E0. Create an empty stack S
> E1. Create set of commits 'J..B' and 'B..J' (that is probably already done)
> E2. For each commit C in smallest(J..B, B..J), execute E3
> E3. For each premerge P in notes-premerge(C), execute E4
> E4. If one of both parents of P belongs to biggest(J..B, B..J), stack P in S

I don't think _both_ parents of P can belong to biggest(J..B, B..J).
AFAICS J..B and B..J must always be completely disjoint sets of
commits (this is easier to see when you consider that "A..B" is
equivalent to "B ^A" for any commits A and B), and in E2/E3, you have
already made sure that P has a parent in one of them. There is then no
way that the same parent can occur in the other set, so you have at
most _one_ parent in the other set.

> E5. Merge J and B using all pre-merges from S

This is where things get complicated... :)

First there is one important thing that I have not seen a decision on
yet (maybe this was discussed in an earlier thread?):

Given pre-merge data P for commit A and B, does P encode the merge of
the entire history up to A with the entire history up to B, or does it
only encode the merging of the changes introduced in A with the
changes introduced in B? In other words, are we merging snapshots or
diffs?

In the former case, we only need to find the most recent commits A and
B on their respective branches - for which P exists - and then execute
that one P (or at most two Ps, if there is a criss-cross pre-merge
situation). In the other case, however, we need to enumerate all Ps
that apply to the two branches, and find a way to execute them
chronologically, dealing with missing Ps and conflicting Ps along the
way. IMHO, only the former approach seems practically solvable.

So you do not need to enumerate all Ps in J..B vs. B..J, you only need
to find the _most_ _recent_ P, and execute that one.

> Let's consider that |J..B| is smaller than |B..J|.
> E0 is executed only once
> E1 is O(|J..B| + |B..J|)
> E2 is O(|J..B|)
> E3 is O(|J..B| x the average number of pre-merge per commits P_avg)
> E4 is executed for each parent (let's say it's two/constant, after all
> the topic is "pair" of commits), so still O(|J..B| x P_avg)
> E5 I don't know (how it can be done, and what would be the resulting
> time complexity)
>
> So the time cost for steps E0 to E4 is O(|J..B| + |B..J| x P_avg)
>
> Tools (Save the pre-merges)
> ===========================
>
> Of course we need several tools to maintain the list of premerges, and
> to easily compute them. For example, it would be nice to be able to do
> something like:
>
>     $ git pre-merge topicA topicB topicC
>
> to find, resolve and store all interactions between the topics.

Let's leave out octopus merges for now, and only concentrate on two
branches at a time.

Hope this helps,

...Johan

PS: Could you also use this mechanism to store rerere information?

> We could
> then easily derive to something that would allow to pre-merge a new
> topic with all topics already merged in master..pu (for example).
> Anyway, this task is left for latter.

--
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html