Re: git mailinfo strips important context from patch subjects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 28, 2009 at 04:04:37PM -0700, Junio C Hamano wrote:
> Jeff King <peff@xxxxxxxx> writes:
> 
> > On Sun, Jun 28, 2009 at 08:38:58PM +0100, Roger Leigh wrote:
> >
> >> In most of the projects I work on, the git commit message has
> >> the affected subsystem or component in square brackets, such as
> >> 
> >>   [foo] change bar to baz
> >>
> >> [...]
> >>
> >> The [sbuild] prefix has been dropped from the Subject, so an
> >> important bit of context about the patch has been lost.
> >> 
> >> It's a bit of a bug that you can't round trip from a git-format-patch
> >> to import with git-am and then not be able to produce the exact same
> >> patch set with git-format-patch again (assuming preparing and applying
> >> to the same point, of course).
> >
> > As an immediate solution, you probably want to use "-k" when generating
> > the patch (not to add the [PATCH] munging) and "-k" when reading the
> > patch via "git am" (which will avoid trying to strip any munging).
> >
> > However:
> >
> >> Would it be possible to change the git-mailinfo logic to use a less
> >> greedy pattern match so it leaves everything after
> >> ([PATCH( [0-9/])+])+ in the subject?  AFAICT this is cleanup_subject in
> >> builtin-mailinfo.c?  Could this rather complex function not just do a
> >> simple regex match which can also take care of stripping ([Rr]e:) ?
> >
> > Yes, I think in the long run it makes sense to strip just the _first_
> > set of brackets. I don't think we want to be more specific than that in
> > the match, because we allow arbitrary cruft inside the brackets (like
> > "[RFC/PATCH]", etc). But if format-patch always puts exactly one set of
> > brackets, and am strips exactly one set, then that should retain your
> > subject in practice, even if it starts with [foo].
> 
> I think it may still make sense to insist that PATCH appears somewhere in
> the first set of brackets, but I have stop and wonder if it is even
> necessary.

I imagine not.  I've submitted a patch separately which implements
this behaviour (more on that below).

> Because git removes [sbuild] at the beginning, Roger is unhappy.
> 
>  * Is he happy that git removes [PATCH]?  In E-mail based workflow it is
>    a good practice to mark messages that are patches clearly so that they
>    can be quickly found among the discussions that lead to them, and it is
>    plausible that his project accepted that as an established practice
>    supported well by git.

I'm perfectly happy that [PATCH] is removed.  My requirement is that
the commit created by "git am" is identical to the commit represented
in the patch created with "git format-patch".  The removal of this
is IMO correct, and I agree that it's presence is useful in an email-
based workflow.

>  * Is he happy that git treats the first paragraph of the commit message
>    specially from the rest of the message?  In a project with many
>    commits, it is essential that people write good commit summaries that
>    fits on a single line so that tools like shortlog and gitweb can be
>    used to get a bird-eye view of what happened recently.  Perhaps his
>    project picked it up as the best current practice supported well by
>    git.

I'm also happy with this, and make use of it.  As for the previous
paragraph, I would like the commit message to be preserved correctly
so that the message committed by "git am" matches the original
commit message exactly.

>  * Is he happy that git takes "---" as the end of message marker, so that
>    any other commentary can be added to the message to facilitate the
>    communication without adding noise to the commits?  Perhaps he is and
>    his project picked it up as a good practice supported well by git.

This sounds just fine, though I have not yet had the need to use it.

> _An_ established (note that I did not say _the_ nor _best current_)
> practice supported well by git to note the area being affected in a
> project of nontrivial size is to prefix the single line summary with the
> name of the area followed by a colon.  There is no difference between
> "[sbuild] foo" and "sbuild: foo" at the information content point-of-view,
> but the latter has an advantage of being one letter shorter and less
> distracting in MUA.  He does not have a very strong reason to choose
> something different only to make his life harder, does he?

Well, I sometimes use the format

  [foo] bar: baz

but my more general point was not my specific usage but that the
existing behaviour was causing loss of information.  I think it
would be preferable to guarantee that data from the original
commit is not lost and is preserved exactly if at all possible.

> Supporting a slightly different convention may seem to be accomodating and
> nice, but if there is no real technical difference between the two (and
> again, "area:" is one letter shorter ;-), letting people run with
> different convention longer, when they can switch easily to another
> convention that is already well supported, may actually hurt them in the
> long run.  "[sbuild]" will not match "--area=sbuild" that will internally
> become "--grep-only-first-line=sbuild:" so either he will miss out
> benefiting from the new feature, or the implementation of the new feature
> unnecessarily needs more code.

This is a nice feature I wasn't aware of, so thanks for pointing it
out.  It might be useful to alter my workflow to allow it to be used,
or alternatively customisation to allow a custom regex stored e.g.
in .git/config would allow me to match both forms?

The patch I sent to the list separately replaces the existing
cleanup_subject string munging (which is rather complex and
hairy), with a single regular expression to match the bits of
the string we don't want such as '^Re:' and the first set of
square brackets.  We then just keep the remainder.  I initally
went with the following extended regex:

  ^([Rr]e: )?(.*PATCH[^]]*\\] )(.*)$

but as per your comments above about removing the first set of
brackets whatever the contents, chose the following more
general expression:

  ^([Rr]e:)?([^]]*\\[[^]]+\\])(.*)$

This should be rather more maintainable and flexible than the
existing code, because one can just tweak the regex rather than
fiddling with hairy string offsets.  This preserves the
existing behaviour with the exception of matching the first []
pair only rather than being "greedy" and removing everything up
to the last "]".


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]