Re: [PATCH] subtree: add squash handling for split and push

Matthew Ogilvie <mmogilvi_git@xxxxxxxxxxxx> · Thu, 28 Nov 2013 11:23:09 -0700

On Sat, Nov 23, 2013 at 09:18:56PM +0100, Pierre Penninckx wrote:
> The documentation of subtree says that the --squash option can be used
> for add, merge, split and push subtree commands but only add and merge
> is implemented.

Clarification: The current documentation (correctly) doesn't
actually claim to support "split --squash", but it does erroneously
claim to support "push --squash".

> cmd_split() first lets split do it's job: finding which commits need to
> be extracted. Now we remember which commit is the parent of the first
> extracted commit. When this step is done, cmd_split() generates a squash
> of the new commits, starting from the aforementioned parent to the last
> extracted commit. This new commit's sha1 is then used for the rest of
> the script.

I've been planning to implement something similar to this patch,
but the semantics I am aiming at are slightly different.

It looks like your patch is basically squashing the new subtree commits
together, throwing out those commits completely, and only keeping
the squashed commit in the split --branch.  

I intend to implement slightly different semantics, where
--squash only affects --rejoin, not the printed commit nor
the split-off --branch.  This is intended to provide a better,
third option for --rejoin'ing a subtree with a lot of history,
while preserving history in the split-off branch:

1. (existing/slow) Don't ever use --rejoin at all?  You can use
   "merge --squash" to merge in unrelated changes to the
   split-off project, but every "split" still gets slower
   and slower as each "split" needs to re-sift-through all
   the same history the previous "split"s have sifted
   through. 

2. (existing/huge mass of duplicated history) Use "split --rejoin"
   occasionally.  This pulls in the entire history of the
   subtree branch (since the last --rejoin or non-squash merge,
   or everything if neither has been done), which is difficult
   to ignore when looking at global history of the full project,
   especially if it is many pages of commits.  But subsequent
   splits can stop history traversal at the known-common point,
   and will run MUCH faster.

3. (new/better) Use "split --rejoin --squash" (or some other
   invocation to be defined).  The subtree branch is generated
   exactly like normal, including fine-grained history.  But
   instead of merging the subtree branch directly, --rejoin
   will squash all the changes to that branch, and merge in
   just the squash (referencing the unsquashed split
   branch tip in the commit message, but not the
   parent).  Subsequent splits can run very fast, while the
   "--rejoin" only generated two commits instead of the 
   potentially thousands of (mostly) duplicates it would pull
   in without the "--squash".

I have this third option half-coded already, but I still need
to finish it.

I'm fairly sure I can make this work without new adverse effects,
but if someone sees something I'm missing, let me know.

Does anyone have any suggestions about the UI?  Do we need to also
support Pierre Penninckx's "split --squash" semantics somehow?  If
so, what command line options would allow for distinguishing the
two cases?

--
Matthew Ogilvie   [mmogilvi_git@xxxxxxxxxxxx]
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html