"git subtree --squash" interacts poorly with revert, merge, and rebase

Matt McCutchen <matt@xxxxxxxxxxxxxxxxx> · Wed, 26 Oct 2016 19:07:24 -0400

I'm the lead developer of a research software application (https://bitb
ucket.org/objsheets/objsheets) that uses modified versions of two
third-party libraries, which we need to version and distribute along
with our application.  For better or for worse, we haven't made it a
priority to upstream our changes, so for now we just want to optimize
for ease of (1) making and reviewing changes and (2) upgrading to newer
upstream versions.

We've been using git submodules, but that's a pain for several reasons:
- We have to run "git submodule update" manually.
- We have to make separate commits and manage corresponding topic
branches for the superproject and subprojects.
- A diff of the superproject doesn't include the content of
subprojects.

Recently I looked into switching to the "git subtree" contrib tool in
the --squash mode, but I identified a few drawbacks compared to
submodules:

1. The upstream commit on which the subtree is based is assumed to be
given by the latest squash commit in "git log".  This means that (i) a
change to a different upstream commit can't be reverted with "git
revert" and (ii) a "git merge" of two superproject branches based on
different upstream commits may successfully merge the content of the
upstream commits but leave the tool thinking the subtree is based on an
arbitrary one of the two commits.

2. Rebasing messes up the merge commits generated by "git subtree --
squash".  --preserve-merges worked in a simple test but supposedly
doesn't work if there are conflicts or I want to reorder commits with
--interactive.

Maybe we would never hit any of these problems in practice, but they
give me a bad enough feeling that I'm planning to write my own tool
that tracks the upstream commit ID in a file (like a submodule) and
doesn't generate any extra commits.  Without generating extra commits,
the only place to store the upstream content in the superproject would
be in another subtree, which would take up disk space in every working
tree unless developers manually set skip-worktree.  I think I prefer to
not store the upstream content and just have the tool fetch it from a
local subproject repository each time it's needed.

I'll of course post the tool on the web and would be happy to see it
integrated into "git subtree" if that makes sense, but I don't know how
much time I'd be willing to put into making that happen.

Any advice?

Thanks,
Matt