Re: Using Origin hashes to improve rebase behavior

John Wiegley <johnw@xxxxxxxxxxxx> · Thu, 10 Feb 2011 22:14:15 -0500

Jeff King <peff@xxxxxxxx> writes:

> Now, in step 2 we could record "X" as an origin ID of X'' and during the
> rebase in step 3, calculate the intersection of the origins of X'' and X',
> and see that they are both just X. And I think maybe you already realize
> that, since you talk about Origin-IDs as sets.

Right.

> And there are lots of other cases. What about "git cherry-pick -n"? What
> about rebasing? If there are no conflicts, is it OK to copy the origin
> field? How about if there are conflicts? How about in a "git rebase -i",
> where we may stop and the user can arbitrarily split, amend, or add new
> commits. How do the old commits map to the new ones with respect to
> origin fields?

During rebasing, any commits which can be rebased without conflict have their
origin transferred (and each time it would cause the origin id list to grow by
one), but any commits which are squashed or edited would not transfer.

For cherry-pick -n, if the index is empty at the time the cherry-pick is done
(is this required?), then a file is created under .git/ with the SHA of the
changes placed in the index, so that when git-commit is later run and the
index has not been changed, then the Origin-Id for that originating commit
gets placed at the bottom of the commit message.

> So there are lots of corner cases where it won't work, because git is
> more than happy to give you lots of ways to tweak tree state and
> history, and it fundamentally doesn't care as much about process as it
> does about the end states that you reach. That's part of what makes git
> so flexible, but it also makes niceties like "did I already apply this
> commit on this branch" much harder to make sense of.

I think we'd want to restrict this system to those commits which were
automatically rewritten without conflicts.  Any user intervention in the
process would invalidate the meaning of the Origin-Id.

> It probably shouldn't be a new header field, but rather a text-style
> pseudo-header at the end of the commit.

I understand.

> But consider for a moment whether you actually want this field in the
> resulting commit at all, or whether it should be an external annotation.
> For example, let's say I cherry-pick from a private branch that is going
> to end up rebased anyway. Now the history for all time will have a
> commit that refers to some totally useless sha1 that nobody even knows
> about.

The problem with an external annotation is that if developers are sharing
feature branches, as a branch maintainer I want to know whether commits coming
from those feature branches are already in the branch I'm maintaining.

> There may be reasons why that isn't a good idea, and I haven't thought it
> through. But I think you should consider it as an alternate implementation
> and tell me why I'm dumb in that case. ;)

I'll give it a bit more thought as I consider the implementation of this.

> Whew, that turned out long. I hope it's helpful. I think the problem
> you're trying to solve is a real one, and I think your approach is the
> right direction. I just think we can leverage existing git features to
> do most of it, and because it is sort of a heuristic, we should be
> conservative in how it's introduced.

That's all extremely helpful, thank you!  You've brought up several use cases
I hadn't thought of, and perhaps this feature will indeed never cover
everything, but if it can reliably ease maintenance 80% of the time, I think
it's a relatively simple addition.

John
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html