Re: RFD: best way to automatically rewrite a git DAG as a linear history?

Jon Seymour <jon.seymour@xxxxxxxxx> · Fri, 19 Feb 2010 18:29:23 +1100

On Fri, Feb 19, 2010 at 2:13 PM, Avery Pennarun <apenwarr@xxxxxxxxx> wrote:
> On Thu, Feb 18, 2010 at 8:04 PM, Jon Seymour <jon.seymour@xxxxxxxxx> wrote:
>
> Using the linearization mechanism you propose, you end up producing a
> false history: one in which, other than at certain checkpoints, the
> code doesn't even work.  What's the point of such a history?  It
> neither reflects the true development history (ie. pre-linearization)
> nor a more useful, idealized version of history (ie. one that compiles
> at every point and adds features in a rational order and is useful for
> git bisect).

If there are no merge conflicts in the original history, then there
will be no merge conflicts in the rewritten history, and therefore no
error deltas.

The point of creating the linearization of this kind is that if there
are no merge conflicts, it flattens the hierarchy in a form that is
immediately rebaseable and will faithfully represent the work the
developer would have done if they had decided to rebase at each merge
instead of merging.

If there are merge conflicts, then it produces a history that
indicates the extent the merge conflict rectification that will be
needed which then allows you to decide whether you want to attempt the
rebase.  If you decide to rebase, then it should just be a question of
deleting the delta commits and fixing the merge conflicts as they crop
up.

My contention is that most of the text diffs in the rewritten history
(with the exception of the error deltas) will actually represent the
intent of the developers original changes although until the
rectification work is done the commit sequences bounded by error
deltas would not be usable for git bisect, compiles or any other
purpose that requires an intact tree.

In the no  conflict case, it is not clear to me that the history
resulting from your script is immediately rebaseable, precisely
because of the presence of the merge commits [ feel free to correct me
if I am wrong about that ] . With my approach, the merge commits
dissolve away - there is nothing to edit.

>
> It doesn't even provide something useful for patch review, since half
> your patches will have randomly-selected conflict resolutions (ie.
> changes to unrelated code that never should have changed) thrown in.
> You'd be better off reviewing patches from the original history, and
> just ignoring merge commits, which is what 'git format-patch' or just
> 'git log -p' would do automatically.

The conflict resolutions are far from random. They are precisely
chosen to reconstruct the blob in such a way that all subsequent picks
in the same path segment apply cleanly.  This is  a deliberate choice
because we know that conflict will be resolved eventually. We are
temporarily deferring correctness to allow us to automatically proceed
with a speculative rewrite of the merge history as a rebase history.
The extent of incorrectness in the history is well delimited and well
understood.

>
> The result is also still not suitable for submission upstream: the
> sync points (where the files actually match what the developer had in
> his tree) are the only places where the code is even likely to
> compile, and yet they *also* include all the code brought in by prior
> merges, which you already said include code that shouldn't go
> upstream.

I agree it is not suitable for many purposes. I contend that what it
allows one to do is rewrite the merge history as a rebase history in a
form that allows the merge conflict resolutions to be deferred. In the
no conflict case, the linearisation is immediately usable (with no
further edits) as a rebase source.

>
> The linearization script I gave you at least has these interesting
> characteristics:
>
> - If the original history compiled at every point, then the linearized
> history does too.
>
> - It is an accurate representation of the successive states of the
> tree experienced by the original developer.
>
> - You can use 'git rebase' to incrementally rearrange and combine
> patches until they make enough sense to submit upstream.
>
> - It is easy to separate out merges (which usually don't need patch
> review) from individual patches (which do).
>
> - If some merges added useless code, you can remove them completely
> with rebase by just removing a single patch from the list.
>
> Of course, even with this script, it will still take work (rebasing)
> to produce code that's polished and ready to go upstream.  But I'm not
> sure there's a way to automate that without producing interim versions
> that are much, much worse.
>
> Have fun,
>
> Avery
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html