Workflow for GitHub branch model interop: advice?

William Chargin <wchargin@xxxxxxxxx> · Sat, 9 Feb 2019 22:39:21 -0800

Hi Git folks,

I’m looking into automating a Git workflow, and am interested in folks’
feedback about whether my plan is reasonable or insane.

The problem that I’m trying to solve is: I use GitHub a lot for work,
but often find myself frustrated with GitHub’s “one-branch-per-change”
model, as opposed to the “a-change-is-a-patch” model used by Phabricator
or Gerrit. When working on a stack of dependent commits (e.g., “add new
API” followed by “replace callers of old API with new API” followed by
“remove old API”), I find it easiest to iterate on these commits by
rewriting the whole stack heavily in interactive rebases. After
rewriting a commit from early in the stack, I need to force-push the
rebased descendants to their respective remote branches. Currently, I
have a script [1] to automate pushing to the right branches, which
mostly gets rid of the pain. However, for various reasons, I’d like to
remove the need to force-push at all.

[1]: https://gist.github.com/wchargin/3934c1c09812a2ad0e4f2092391e1ac8

For instance, suppose that I have commits A and B, which have also been
pushed to remote branches origin/a and origin/b. I amend A to A' in an
interactive rebase, so that my local state has B' on top of A'. Then,
I further amend B' to B''. The `git log --graph --oneline` looks like:

    Local (after interactive rebase):
    * (HEAD) B''
    * A'
    * (origin/master) whatever

    Remote (not yet updated):
    * (origin/b) B
    * (origin/a) A
    * (origin/master) whatever

The desired end state is that the diff of origin/master..origin/a should
be the same as the diff of A', and the diff of origin/a..origin/b should
be the same as the diff of B''. Again, force-pushing A':a and B'':b gets
the job done, but I’d like to send only fast-forward updates. Ideally,
each pushed commit should be easily readable by a code reviewer: I don’t
want a singular commit on origin/b to contain both the changes due to
the rebase on A' and the changes due to the amendment of B' to B''.

My plan to achieve this is: to update branch origin/x with commit X:

 1. Check out origin/x.
 2. Merge the unique parent of X into HEAD. If there are conflicts,
    merge them verbatim: include conflict markers in the commit, and
    “take theirs” on binary files.
 3. Create a new commit whose tree is exactly the tree of X and whose
    unique parent is the merge commit created in step (2).
 4. Push this commit to origin/x.

Roughly, in code:

    $ git checkout --detach "origin/${branch}"
    $ git merge "${X}~" -m "update diffbase"
    $ if merge_needs_resolution; then git add .; git commit --no-edit; fi
    $ commit="$(printf ... | git commit-tree "${X}^{tree}" -p HEAD)"
    $ git push origin "${commit}:refs/heads/${branch}"

Conceptually, the topology looks like:

    Local:
    * (HEAD) B''
    * A'
    * (origin/master) whatever

    Remote:
    * (origin/b) commit with tree B''^{tree}
    * merge A' into B verbatim
    |\
    | * (origin/a) commit with tree A'^{tree}
    * | B
    |/
    * A
    * (origin/master) whatever

(though really origin/a and origin/b^^2 will be different commits with
the same tree according to the code above—it would be nicer if they
shared identity, too, and this is achievable with a bit of bookkeeping).

(Please note that the topology is only important for the duration of the
code review. When the patch is accepted, all of origin/a..origin/b is
squashed into one commit before merging, anyway. In my humble opinion,
this is further evidence that “a-change-is-a-patch” is a more natural
model than “one-branch-per-change”, but no matter.)

So, first question: does this sound like a reasonable objective and plan
so far?

My second question is more technical and relates to the implementation.
To create the merge in step (2), I plan to use a new git-worktree in a
temporary directory. Creating a merge requires _a_ worktree (can’t be
done bare), and I’d prefer not to use the user’s primary worktree, so
that (a) I don’t have to worry about uncommitted or unstaged changes
when executing this workflow, and (b) the user isn’t left in a borked
state if they SIGINT the process or something goes wrong and it aborts.

But, on the other hand, creating a whole worktree is a bit heavy-handed.
Creating a new worktree for linux.git takes about 6 seconds on my
laptop, which is not terrible considering that repository’s size, but
would be nice to avoid if possible. It would be great if I could create
a “lazy worktree” and only check out files actually needed for the
merge. Would the partial clone feature added in v2.19 be helpful here?

I’d greatly appreciate any suggestions or advice! I’m willing to be told
that I’m going about this all wrong and that my local workflow is bad to
begin with, preferably with such a claim accompanied by the description
of a better workflow for handling stacks of dependent commits that need
to eventually be merged as separate GitHub pull requests. :-)

Best,
WC