Re: RFD: best way to automatically rewrite a git DAG as a linear history?

Avery Pennarun <apenwarr@xxxxxxxxx> · Thu, 18 Feb 2010 22:13:45 -0500

On Thu, Feb 18, 2010 at 8:04 PM, Jon Seymour <jon.seymour@xxxxxxxxx> wrote:
> Avery's script almost does what I need, except the rewritten diffs
> corresponding to the merge commits introduces unnecessary noise (from
> upstream deltas) in the series and potentially complicate eventual
> merges of the linear history back into the upstream.

You're never going to get the "linear" history merged back upstream
until you fix the inconsistencies.  At least, no sensible upstream
should accept the patches.

Using the linearization mechanism you propose, you end up producing a
false history: one in which, other than at certain checkpoints, the
code doesn't even work.  What's the point of such a history?  It
neither reflects the true development history (ie. pre-linearization)
nor a more useful, idealized version of history (ie. one that compiles
at every point and adds features in a rational order and is useful for
git bisect).

It doesn't even provide something useful for patch review, since half
your patches will have randomly-selected conflict resolutions (ie.
changes to unrelated code that never should have changed) thrown in.
You'd be better off reviewing patches from the original history, and
just ignoring merge commits, which is what 'git format-patch' or just
'git log -p' would do automatically.

The result is also still not suitable for submission upstream: the
sync points (where the files actually match what the developer had in
his tree) are the only places where the code is even likely to
compile, and yet they *also* include all the code brought in by prior
merges, which you already said include code that shouldn't go
upstream.

The linearization script I gave you at least has these interesting
characteristics:

- If the original history compiled at every point, then the linearized
history does too.

- It is an accurate representation of the successive states of the
tree experienced by the original developer.

- You can use 'git rebase' to incrementally rearrange and combine
patches until they make enough sense to submit upstream.

- It is easy to separate out merges (which usually don't need patch
review) from individual patches (which do).

- If some merges added useless code, you can remove them completely
with rebase by just removing a single patch from the list.

Of course, even with this script, it will still take work (rebasing)
to produce code that's polished and ready to go upstream.  But I'm not
sure there's a way to automate that without producing interim versions
that are much, much worse.

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html