Re: [PATCH 0/1] add git-splice subcommand for non-interactive branch splicing

Adam Spiers <git@xxxxxxxxxxxxxx> · Tue, 1 Aug 2017 02:14:21 +0100

On 31 July 2017 at 23:18, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Adam Spiers <git@xxxxxxxxxxxxxx> writes:
>
> > Therefore there is a risk that each new UI for higher-level workflows
> > will end up re-implementing these mid-level operations.  This
> > undesirable situation could be avoided if git itself provided those
> > mid-level operations.
>
> Let me make sure if I get your general idea right, first.
>
> Is your aim is to give a single unified mid-layer that these other
> tools can build on instead of rolling their own "cherry-pick these
> ranges, then squash that in, and then merge the other one in, ..."
> sequencing machinery?

Pretty much, yes.  The original itch I wanted to scratch was
implementing git-explode, which aims to automatically explode a large
topic branch into a set of smaller, independent topic branches, by
harnessing my git-deps for automatically detecting inter-dependencies
between commits in the large source branch and using that dependency
tree to construct the smaller topic branches.  (Before anyone protests
at this point, yes, I am fully aware that it is not possible to
automate 100% accurate detection of these dependencies, and no, that
does not completely invalidate the approach.[0])

My initial thought was that in order to be able to automatically
decompose a branch into smaller branches, I would need a mid-layer
operation "git-transplant" somewhat analogous to mv(1), which would
let me easily move commits out of the source branch into a new target
branch.  And then I realised that, in the same way that
(simplistically speaking) mv(1) could be reimplemented as cp(1)
followed by rm(1), implementing "git-transplant" in turn would require
more primitive operations for copying commits between branches, and
removing commits from branches.  At this point I saw value in
generalising those operations; hence the idea for git-splice was born.

Consequently I implemented prototypes for splice and transplant, which
didn't take too long.  (The real work was writing comprehensive test
suites and polishing the tools until they were reliable enough to pass
100%.)

Ironically, soon after I started to implement git-explode, I realised
that the order in which I needed to walk the dependency tree
discovered by git-deps actually meant that I couldn't use
git-transplant for this particular use case, so in the end I
implemented it with pygit2.  (I still need to polish it up a bit more
before releasing.)

However, even though splice and transplant are not useful for this
particular use case, I still believe that they (or similar tools) have
the potential to serve as a useful foundation for other workflows.

> If so, I think that is a very good goal.

Glad to hear it :-)

> >     # Remove commits A..B (i.e. excluding A) from the current branch.
> >     git splice A..B
> >     # Remove commit A from the current branch.
> >     git splice A^!
> >     # Remove commits A..B from the current branch, and cherry-pick
> >     # commits C..D at the same point.
> >     git splice A..B C..D
>
> We need to make sure that the mid-layer tool offers a good set of
> primitive operations that serve all of these other tools' needs.  I
> do not know offhand if what you implemented that are illustrated by
> these examples is or isn't that "good set".

Agreed.  That's why I sent the RFC to this list last year: in the hope
that these details could be hashed out and guide my development in the
right direction.  Unfortunately I didn't get much response at the
time, which was probably my fault for not explaining my "mission
goals" clearly enough.  Although in fairness to myself, I think I
needed a year anyway to let the ideas in my head mature to the point
where I understood them well enough myself to communicate them clearly
to others :-)

> Assuming that there is such a "good set of primitives" surfaced at
> the UI level so that these other tools can express what they want to
> perform with, I'd personally prefer to see a solution that extends
> and uses the common "sequencer" machinery we have been using to
> drive cherry-picks, reverts and interactive rebases that work on
> multiple commits.  IOW, it would be nice to see that the only thing
> "git splice A..B" does is to prepare a series of instructions in a
> file, e.g. .git/sequencer/todo, just like "git cherry-pick A..B"
> would, and let the sequencer machinery to handle the sequencing.
>
> E.g. In a history like
>
>     ---o---A---o---B---X---Y---Z   HEAD
>
> "git splice A..B" command would write something like this:
>
>     reset to A
>     pick X
>     pick Y
>     pick Z
>
> to the todo file and drive the sequencer.

That sounds great to me!  At this point sadly I'm currently a bit
ignorant of the intricacies of the sequencer, otherwise I might have
adopted this approach from day 0.  But I'm pleased to be able to say
that under the hood, the way I implemented splice and transplant isn't
too dissimilar to this: they both write "todo" files, under
.git/splice and .git/transplant respectively, and then execute the
instructions in those files.  So hopefully it wouldn't be much work to
bring them closer to the kind of format you describe above, and then
feed that to the sequencer instead of have them process the tasks
themselves.

> As you notice, you would
> need to extend the vocabulary of the sequencer a bit to allow
> various things that the current users of the sequencer machinery do
> not need, like resetting the HEAD to a specific commit, merging a
> side branch, remembering the result of an operation, and referring
> to such a commit in later operation.  For example, if you tell "git
> splice" to expunge A from this sample history (I am not sure how you
> express that operation in your UI):
>
>          B---C---D
>         /         \
>     ---o---A---E---F---G   HEAD

Currently splice explicitly avoids editing history with merge commits,
although this example has made me realise that there's a bug with the
way it currently does that: it only checks that the removal and
insertion ranges are all non-merge commits before starting execution,
whereas it actually needs to check all the descendant commits too.
Fortunately that's easy to fix :-)

> it might create a "todo" list like this to rebuild the history:
>
>     reset to A^
>     pick B
>     pick C
>     pick D
>     mark :1
>     reset to A^
>     pick E
>     merge :1 using F's log message and conflict resolution as reference
>     pick G
>
> to result in:
>
>          B---C---D
>         /         \
>     ---o-------E---F---G   HEAD
>
> Do not pay too much attention to how the hypothetical "extended todo
> instruction set" is spelled in the above illustration (e.g. I am not
> advocating for multi-word command like "reset to"); these are only
> to illustrate what kind of features would be needed for the job.  In
> the final shape of the system, "merge" in the illustration above may
> be a more succinct "merge F :1", for example (i.e. the first
> parameter would name an existing merge to use as reference, the
> remainder is a list of commits to be merged to the current HEAD),
> just like "pick X" is a succinct way to say "cherry-pick the change
> introduced by existing commit X to HEAD, reusing X's log message
> and author information".

Yep, that all makes perfect sense.  It seems to me that there would be
three main strands of work required here:

     (0) gather use cases for automated higher-level workflows
         from users, so we're clear what kinds of problems are
         most worth solving

     (1) automate generation of instruction sequences which
         reflect those workflows (or parts thereof)

     (2) extend the sequencer as/when required by (1)

> Something like that may have a place in the git-core, I would think.

OK, good to know.

> I am not sure if a bash script that calls rebase/cherry-pick/commit
> manually can serve as a good "universal mid-layer" or just adding
> another random command to the set of existing third-party commands
> for "higher-level workflows".

I'm not sure either.  It might or might not be, but I think a debate
on that topic would be worthwhile and something in which I'd be very
interested in taking part.

My first hunch is that if we were to attempt to design this
"mid-layer" of operations, it would make sense to start with the more
primitive operations in that layer, and then build the more
sophisticated ones later - on top of the primitives, if that made
sense.

For example first we could focus on sequences which achieve simple
things like removing a range of commits from a branch where the
descendants of that range are all non-merge commits, and inserting a
range of commits into a branch which satisfies the same "no merge
commits" constraint.  This would achieve parity with git-splice.

Next we could add support for the same operations with the "no merge
commits" constraint dropped, so that your example scenario above could
be handled correctly.

Then we could add support for more complicated operations such as
transplants, and removing / transplanting a whole range of commits
which can form an arbitrarily complex commit graph.  This last one
sounds pretty hairy, which reinforces the value of starting simple.

Also, implementing the more primitive operations first would allow us
to extend the sequencer's capabilities in a more incremental and
risk-averse manner.

Thanks a lot for the reply!  What would you recommend as the next
steps?

[0] This has been discussed before, e.g.
     https://public-inbox.org/git/20160528112417.GD11256@pacific.linksys.moosehall/