Re: RFD: best way to automatically rewrite a git DAG as a linear history?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff,

The use case is extracting a patch series from a developer who has
been frequently pulling (and thus merging) with an upstream but has
not successfully delivered anything upstream.

I want to be able to unpick the upstream merge history and reconstruct
a reasonable faithful representation of the developer's edits in a
linear series of commits that can then be reviewed, edited, squashed,
re-ordered as necessary as part of integration activities. I also want
to handle the cases where a (bad) upstream rebase occurred or the
developer has (incorrectly) merged with history from a peer that has
also not been integrated upstream

This places a useful constraint on the DAG rewrite that I need to do -
I can restrict the rewrite to the reverse of linear traversal from the
developer's current head back to the first commit that has not been
integrated upstream.

Avery's script almost does what I need, except the rewritten diffs
corresponding to the merge commits introduces unnecessary noise (from
upstream deltas) in the series and potentially complicate eventual
merges of the linear history back into the upstream.

What I am doing at the moment is doing a piece-wise replacement of
each merge with an equivalent rebase  from the other branch of the
merge, starting with the oldest merge. While doing this, there arises
the possibility of a merge conflict between a commit made by the
developer and a commit on the other branch of the merge. For my
purposes, at the point of the pick where the conflict is detected, the
conflicted blob is resolved in favour of the developer's blob. This
technically introduces an error into the history since resolving the
conflict in this way is almost certainly not correct. However, it does
mean that all subsequent picks on that segment for that blob will
apply correctly. At the end of each the rewrite of each segment, the
conflicted blob is replaced with the result of the developer's
original merge so that the introduced error is then corrected with a
perfect correction (assuming the developer did a sane merge in the
first place). In effect, each conflict introduces two deltas into the
history - one to enable the subsequent picks to apply cleanly and one
to reapply the developer's original resolution of the conflict.

This approach has several consequences:

* the rewrite is completely automated
* by construction, the tree will be consistent with developer's tree
at the commit corresponding to each merge the developer did.
* there is one commit in the rewritten history for each commit in the
original history + two commits for each auto-resolved conflict (one
that introduces the error to and one that later corrects it using the
developer's merge)

It is true that the rewritten history does contain periods where the
intervening commits are not strictly consistent (periods between the
error introducing delta and its subsequent correction), but if this is
really important, these can be resolved with an interactive rebase as
required. On the otherhand, rewritten history will be fully consistent
at well-specified points - particularly at commits corresponding to
the original merge commits and on any segment that was not affected by
a merge conflict.

Regards,

jon seymour.

On 18/02/2010, at 16:11, Jef King <peff@xxxxxxxx> wrote:

> On Thu, Feb 18, 2010 at 01:35:07PM +1100, Jon Seymour wrote:
>
>> Does the git toolset currently support rewriting a restricted git DAG
>> as a linear history in a completely automated way?
>
> Not really. It's a hard problem in the general case. Consider a
> history
> like:
>
>    B
>   / \
>  A   D
>   \ /
>    C
>
> That is, two branches fork, each make a commit, and then merge. You
> want
> something like:
>
>  A--B--C'
>
> If there is a merge conflict when making D, then you know that B and C
> conflict. In this simple case, you can apply the same conflict
> resolution used in D to the creation of C' (in other words, you use
> the
> combined tree state given in D as the tree for C'). But what if C is a
> string of commits? Some of the conflict resolution in D will be
> applicable to some of the conflicts you will encounter when rebasing
> C,
> but you don't know which.
>
> One simple strategy would be to squash all side-branch development
> into
> a single commit. So you would turn:
>
>    B--C--D
>   /       \
>  A         H
>   \       /
>    E--F--G
>
> into
>
>  A--B--C--D--X
>
> where X has the same tree as H, but contains all of the commit
> messages
> of E, F, and G.
>
> You are of course losing quite a bit of information there, but you
> haven't really told us what your use case is, so I don't know whether
> that's unacceptable or not.
>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]