Re: [git patches] libata updates, GPG signed (but see admin notes)

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 09 Nov 2011 09:26:42 -0800

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> No, no, don't consider my "put in the merge message" a winner at all.
>
> I personally dislike it, and don't really think it's a wonderful thing
> at all. I really does have real downsides:
>
>  - internal signatures really *are* a disaster for maintenance. You
> can never fix them if they need fixing (and "need fixing" may well be
> "you want to re-sign things after a repository format change")
>
>  - they are ugly as heck, and you really don't want to see them in
> 99.999% of all cases.
>
> So putting those things iin the merge commit message may have some
> upsides, but it has tons of downsides too.
>
> I think your refs/audit/ idea should be given real thought, because
> maybe that's the right idea.

With the latest round of touch-ups, modulo a few bugs I will be fixing
before the 1.7.8 final, I think what we have is more or less OK in the
shorter term and should be ready for general consumption. The ugliness is
gone, but the issue around internal signatures may remain to be solved in
the longer term. At least, by storing the full contents of the tag today
in an extended header, when we figure out how a detached signature should
really work, we could convert by extracting them from the history.

In a separate message earlier in the thread, you raised another issue.

> I hate how anonymous our branches are. Sure, we can use good names for
> them, but it was a mistake to think we should describe the repository
> (for gitweb), rather than the branch.
> 
> Ok, "hate" is a strong word. I don't "hate" it. I don't even think
> it's a major design issue. But I do think that it would have been
> nicer if we had had some branch description model.

At the first glance, our branch model is indeed peculiar in that a branch
does not have a global identity. The scope of its name is local to the
repository, and it is just a pointer into the history. A "note" [*1*] that
can annotate a commit long after the commit is made is not a good way to
describe what a branch is about, because the tip of the branch can advance
beyond the commit that is annotated by such a note. A commit on a branch
does not serve as a good anchoring point to describe the branch.

However, a commit that merges the history of a branch, whether the merged
branch is from a local repository or from a remote one, does serve as a
good anchoring point. The work on a branch is finished as complete as
possible at the time of the merge, and the committer who merges the branch
agrees with both the objective and the implementation of the work done on
the branch, and that is why the merge is made [*2*]. Describing what the
history of the side branch was about in the resulting merge is a perfectly
sensible way to explain the branch. So in that sense, I am very happy with
the way the merge message template uses the pull request tag to let the
lieutenant explain and defend the history behind the tag used for the pull
request. Such an explanation does not have to be keyed with anybody's
local branch name (e.g. "for-linus" would mean different things for
different pull requests even from the same person), but keying it with the
resulting merge commit is a sensible way to leave the record in the
history.

After justifying with the above two paragraphs that it is perfectly
sensible to record the annotations on commits and not on "branch names", I
do agree that we would eventually want to be able to have such annotations
on commits after the fact. Neither "tags" nor "notes" is necessarily a
very good mechanism, however, for the purpose of "signed pull requests"
and "signed commits" [*3*]. Here are some pros and cons:

 - tags must be named, but the only thing we need is to be able to look
   the contents (with signature if signed) up given a commit object.
   Unlike the usual "I want to check out v3.0 release" look-up that goes
   from tag names to the commits, annotation look-ups go the other way, do
   not have to have a tagname, and having tagname does not help our
   look-up in any way. If we want to use tag to annotate various commits
   by various people and keep them around, we would need global namespace
   that would not cause them to crash (we can work this around by using
   the object name of the tag, e.g. renaming 'for-linus' tag to $(git
   rev-parse tags/for-linus), but that is merely a workaround of having to
   name things that do not have to be named in the first place). As a
   local storage machinery for annotations, tags hanging below refs/tags/
   (or refs/audit for that matter) hierarchy with their own names is an
   inappropriate model.

 + tags can auto-follow the commits when object transfer happens (at least
   in the fetch direction), and for the purpose of "signed pull requests"
   and "signed commits", this is a desirable property. When a repository
   gains a commit, the annotations attached to the commit that are missing
   from the receiving repository are automatically transferred from the
   place the commit comes from. Annotations given to other commits that
   are not transferred into the repository do not come to the repository.

 - "git notes" is represented as a commit that records a tree that holds
   the entire mapping from commit to its annotations, and the only way to
   transferr it is to send it together with its history as a whole. It
   does not have the nice auto-following property that transfers only the
   relevant annotations.

 + "git notes" maps the commits to its annotations in the right direction;
   the object name of an annotated object to its annotation.

In the longer term, I think we would need to extend the system in the
following way:

 - Introduce a mapping machanism that can be locally used to map names of
   the objects being annotated to names of other objects (most likely
   blobs but there is nothing that fundamentally prevents you from
   annotating a commit with a tree). The current "git notes" might be a
   perfectly suitable representation of this, or it may turn out to be
   lacking (I haven't thought things through), but the important point is
   that this "mapping store" is _local_. fsck, repack and prune need to be
   told that objects that store the annotation are reachable from the
   annotated objects.

 - Introduce a protocol extension to transfer this mapping information for
   objects being transferred in an efficient way. When "rev-list --objects
   have..want" tells us that the receiving end (in either fetch/push
   direction) would have an object at the end of the primary transfer
   (note that I did not say "an object will be sent in this transfer
   transaction"; "have" does not come into the picture), we make sure that
   missing annotations attached to the object is also transferred, and new
   mapping is registered at the receiving end.

The detailed design for the latter needs more thought. The auto-following
of tags works even if nothing is being fetched in the primary transfer
(i.e. "git fetch" && "git fetch" back to back to update our origin/master
with the master at the origin) when a new tag is added to ancient part of
the history that leads to the master at the origin, but this is exactly
because the sending end advertises all the available tags and the objects
they point at so that we can tell what new tags added to an old object is
missing from the receiving end. This obviously would not scale well when
we have tens of thousands of objects to annotate. Perhaps an entry in the
"mapping store" would record:

 - The object name of the object being annotated;

 - The object name of the annotation;

 - The "timestamp", i.e. when the association between the above two was
   made--this can be local to the repository and a simple counter would
   do.

and also maintain the last "timestamp" this repository sent annotations to
the remote (one timestamp per remote repository). When we push, we would
send annotations pertaining to the object reachable from what we are
pushing (not limited by what they already have, as the whole point of this
exercise is to allow us to transfer annotations added to an object long
after the object was created and sent to the remote) that is newer than
that "timestamp". Similarly, when fetching, we would send the "timestamp"
this repository last fetched annotations from the other end (which means
we would need one such "timestamp" per remote repository) and let the
remote side decide the set of new annotations they added since we last
synched that are on objects reachable from what we "want".

Or something like that.

[Footnote]

*1* By this word, I do not necessarily mean what the "git notes" command
manipulates. A tag that points at a commit is also equally a good vehicle
to annotate a commit after the fact.

*2* For this reason, it may make sense to "commit -S" such a merge
commit. The "mergetag" asserts the authenticity of the pull request from
the lieutenant whose history is being integrated, and the "gpgsig" asserts
the authenticity of the merge itself--the fact that it was made by the
integrator.

*3* I do not mean what "git commit -S" parked in 'pu' produces, which is
to store the signature in the commit. Adding "Signed-off-by:" after the
fact to an existing commit by many people is a more appropriate example.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html