Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata

Paolo Bonzini <bonzini@xxxxxxx> · Wed, 10 Sep 2008 11:35:18 +0200

> Why do you actually *follow* the origin link at all anyway? Without its
> parents, the associated tree etc., the object is essentially useless for
> you

Stephen posed the origin links as weak, but it is not necessarily true
that you don't have the parents and the associated tree.  For example,
if you download a repository that includes a "master" branch and a few
stable branches, you *will* have the objects cherry-picked into stable
branches, because they are commits in the master branch.

Junio explained that the way achieves the same effect in git is by
forking the topic branch off the "oldest" branch where the patch will
possibly be of interest.  Then he can merge it in that branch and all
the newest ones.  That's great, but not all people are as
forward-looking (he did say that sometimes he needs to cherrypick).

Another problem is that in some projects actually there are two "maint"
branches (e.g. currently GCC 4.2 and GCC 4.3), and most developers do
not care about what goes in the older "maint" branch; they develop for
trunk and for the newer "maint" branch, and then one person comes and
cherry-picks into the older "maint" branch.  This has two problems:

1) Having to fork topic branches off the older branch would force extra
testing on the developers.

2) Besides this, topic branches are not cloned, so if I am the
integrator on the older "maint" branch, I need to dig manually in the
commits to find bugfixes.  True, I could use Bugzilla, but what if I
want to use git instead?  There is "git cherry -v ... | grep -w ^+.*PR",
except that it has too many false negatives (fixes that have already
been backported, but do show up in the list).

> And why are the notes created by git cherry-pick -x insufficient for that?

For example, these notes (or the ones created by "git revert") are
*wrong* because they talk about commits instead of changesets (deltas
between two commits).

Why is only one commit present?  Because these messages are meant for
users, not for programs.  That's easy to show: users think of commits as
deltas anyway, even though git stores them as snapshots---"git show
HEAD" shows a delta, not a snapshot.

And what does this mean for programs?  That they must resort to
commit-message scraping to distinguish the two cases. (*)

   (*) A GUI blame program, for example, would need to distinguish
   whether code added by a commit is taken from commit 4329bd8, or is
   reverting commit 4329bd8.  (In the first case, the author of that
   code is whoever was responsible for that code in 4329bd8; in the
   second case, it is whoever was responsible for that code in
   4329bd8^).  If recording changesets, you see 4329bd8^..4329bd8 in
   the first case, and 4329bd8..4329bd8^ in the second, so it is trivial
   to follow the chain.

And scraping is bad.  Imagine people that are writing commit messages in
their native language.  What if they patch git to translate the magic
notes created by "git cherry-pick -x" or "git revert" (maybe a future
version of git will do that automatically)?  Should they translate also
every program that scrapes the messages?

Whenever there is a piece of data that could be useful to programs (no
matter if plumbing or porcelain), I consider free form notes to be bad.
 Because data is data, and metadata is metadata.

If there was a generic way to put porcelain-level metadata in commit
messages (e.g. Signed-Off-By and Acknowledged-By can be already
considered metadata), I would not be so much in favor of "origin" links
being part of the commit object's format.  Now if you think about it,
commit references within this kind of metadata would have mostly the
properties that Stephen explained in his first message:

1) they would be rewritten by git-filter-branch

2) these references, albeit weak by default, could optionally be
followed when fetching (either with command-line or configuration options)

3) they would not be pruned by git-gc

4) possibly, git rev-list --topo-order would sort commits by taking into
account metadata references too.

So the implementation effort would be roughly the same.

But, can you think of any other such metadata?  Personally I can't, so
while I understand the opposition to a new commit header field that
would be there from here to eternity (or until the LHC starts), I do
think it is the simplest thing that can possibly work.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html