On 04/25/2014 11:07 PM, Christian Couder wrote: > From: Michael Haggerty <mhagger@xxxxxxxxxxxx> >>>>> +OPTIONS >>>>> +------- >>>>> +--trim-empty:: >>>>> + If the 'value' part of any trailer contains only whitespace, >>>>> + the whole trailer will be removed from the resulting message. >>>> >>>> Does this apply to existing trailers, new trailers, or both? >>> >>> Both. >>> >>>> If it applies to existing trailers, then it seems a bit dangerous, in the >>>> sense that the command might end up changing trailers that are unrelated >>>> to the one that the command is trying to add. >>> >>> The command is not just for adding trailers. >>> But there could be an option to just trim trailers that are added. >> >> Maybe that should be the *only* behavior of this option. >> >> Maybe there should be a trailer.<token>.trimEmpty config option. > > One possible usage of the "git interpret-trailers" command that was > discussed in the early threads was the following: > > 1) You have a commit message template like the following: > > ----------------- > ***SUBJECT*** > > ***MESSAGE*** > > Fixes: > Cc: > Cc: > Reviewed-by: > Reviewed-by: > Signed-off-by: > ----------------- > [...etc...] Thanks for the explanation. Now the --trim-empty option makes a lot more sense. >>>> If a commit message containing trailer lines with separators other than >>>> ':' is input to the program, will it recognize them as trailer lines? >>> >>> No, '=' and '#' are not supported in the input message, only in the output. >>> >>>> Do such trailer lines have to have the same separator as the one listed >>>> in this configuration setting to be recognized? >>> >>> No they need to have ':' as a separator. >>> >>> The reason why only ':' is supported is because it is the cannonical >>> trailer separator and it could create problems with many input >>> messages if other separators where supported. >>> >>> Maybe we could detect a special line like the following: >>> >>> # TRAILERS START >>> >>> in the input message and consider everyhting after that line as trailers. >>> In this case it would be ok to accept other separators. >> >> It would be ugly to have to use such a line. I think it would be >> preferable to be more restrictive about trailer separators than to >> require something like this. > > The code is already very restrictive about trailer separators. > >> From what you've said above, it sounds like your code might get confused >> with the following input commit message: >> >> This is the human-readable comment >> >> Foo: bar >> Fixes #123 >> Plugh: xyzzy >> >> It seems to me that none of these lines would be accepted as trailers, >> because they include a non-trailer "Fixes" line (non-trailer in the >> sense that it doesn't use a colon separator). > > Yeah, they would not be accepted because the code is very restrictive. > > The following would be accepted: > > Foo: bar > Fixes: 123 > Plugh: xyzzy > >>>> I suppose that there is some compelling reason to allow non-colon >>>> separators here. If not, it seems like it adds a lot of complexity and >>>> should maybe be omitted, or limited to only a few specific separators. >>> >>> Yeah, but in the early threads concerning this subject, someone said >>> that GitHub for example uses "bug #XXX". >>> I will have a look again. >> >> Yes, that's true: GitHub recognizes strings like "fixes #33" but not if >> there is an intervening colon like in "fixes: #33". OTOH GitHub >> recognizes such strings wherever they appear in the commit message (they >> don't have to be in "trailer" lines). So I'm not sure that the added >> complication is worth it if GitHub is the only use case. (And maybe we >> could convince GitHub to recognize "Fixes: #33" if such syntax becomes >> the de-facto Git standard for trailers.) > > I don't think there is a lot of complexity. > But maybe I need to explain how it works better. > Feel free to suggest me sentences I could add. I am really excited about having better support for trailers in Git, and I want to thank you for your work. For me the promise of trailers is * A way for users to add information to commits for whatever purpose they want, without having to convince upstream to built support in. * A standard format for that information, so that all tools can agree how to read/write trailers without being confused by or breaking trailers that they didn't know about in advance. * A format that is straightforward enough that it can be machine- readable with minimum ambiguity. * Some command-line tools to make it easy for scripts to work with trailers, and that serve as a reference implementation that other Git implementations can imitate. For example, I totally expect that we will soon want a command-line tool for inquiring about the presence and contents of trailers, for use in scripting. Eventually we will want to be able to do stuff like git trailers --get-all s-o-b origin/master..origin/next git rev-list --trailer=s-o-b:gitster@xxxxxxxxx master git trailers --pipe --draft \ --add-first fixes \ --append '# You can delete the following line:' \ --append s-o-b:"$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" \ --unset private git trailers --pipe --verify --tidy-up I think it is really important to nail down the format of trailers tightly enough that everybody who reads/writes a commit message agrees about exactly what trailers are there. For example the specification might look something like: A commit message can optionally end with a block of trailers. The trailers, if present, must be separated from the rest of the commit message by one or more blank lines (lines that contain only whitespace). There must be no blank lines within the list of trailers. It is allowed to have blank lines after the trailers. Each trailer line must match the following Perl regular expression: ^([A-Za-z0-9_-]+)\s*:\s*(.*[^\s])\s*$ The string matching the first group is called the key and the string matching the second is called the value. Keys are considered to be case-insensitive [or should they be case-sensitive?]. The interpretation of values is left entirely up to the application. Values must not be empty. However, in --draft and --cleanup modes, empty values *are* allowed, as are comments (lines starting with `core.commentchar`) within the trailer block. In --draft mode such lines are passed through unchanged, and in --cleanup mode such lines are removed. I'm not saying this is the exact definition that we want; I'm just providing an example of the level of precision that I think is needed. With regard to the separator character, my concern is not about how to document the rules for this one tool. It's more about having really well-defined rules that are consistent between reading and writing. For me it seems silly to let "git interpret-trailers" output trailers that it doesn't know how to read back in, and pretty much be a show-stopper if the presence of such trailers makes the tool unable to read other trailers in the same commit message. So my preference would be to make the format of trailers really strict; for example, only allowing colon separators as in the regexp above. People who want to work with trailers using Git tools will just have to conform to this format. But if we must support flexibility in the separator characters, then I think it is important that we be able to read whatever we can write. For me this means: * Enumerating a list of allowed separators (e.g., [:=#]) * Specifying how it is decided what separator to use when generating new trailers * Specifying what appends when a trailer is read and then written again: is its separator preserved, or is the trailer converted to use the separator configured for that particular key in the config. And if the latter, what happens if a key's separator is not configured? * Specifying whether whitespace around a separator is adjusted when reading then writing a trailer. For example, is Foo SP SP : HT bar SP canonicalized to Foo: SP bar (SP=space, HT=tab)? What about Fixes SP #33 ? What if the separator for the "fixes" key is not configured? The reason that I prefer supporting only colons is that more flexibility inevitably raises lots of questions like this, makes the documentation and implementation more complicated, and makes it harder for other implementations to be sure they agree 100% with the reference implementation. >>>>> +With `addIfDifferent`, a new trailer will be added only if no trailer >>>>> +with the same (token, value) pair is already in the message. >>>>> ++ >>>>> +With `addIfDifferentNeighbor`, a new trailer will be added only if no >>>>> +trailer with the same (token, value) pair is above or below the line >>>>> +where the new trailer will be added. >>>>> ++ >>>>> +With `add`, a new trailer will be added, even if some trailers with >>>>> +the same (token, value) pair are already in the message. >>>>> ++ >>>>> +With `overwrite`, the new trailer will overwrite an existing trailer >>>>> +with the same token. >>>> >>>> What if there are multiple existing trailers with the same token? Are >>>> they all overwritten? >>> >>> No, if where == after, only the last one is overwritten, and if where >>> == before, only the first one is overwritten. >>> >>> I could add an "overwriteAll" option. It could be interesting to use >>> when a command using "$ARG" is configured, as this way the command >>> would apply to all the trailers with the given token instead of just >>> the last or first one. >> >> It seems to me that the current behavior (rewriting exactly one existing >> line) is not that useful. Why not make "overwrite" overwrite *all* >> existing matching lines? > > I was thinking that people could use the following template message: > > --------------- > Signed-off-by: > Signed-off-by: YOU-WILL-BE-AUTOMATICALLY-ADDED-HERE > --------------- > > and the following config: > > --------------- > [trailer "s-o-b"] > key = "Signed-off-by: " > ifexist = overwrite > command = echo \"$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>\" > --------------- > > This way the user can add other people's s-o-b before the last one > which will always contain his own s-o-b. That seems fragile. For example, if the user changes the lines to Signed-off-by: Somebody Else <...> (deleting the "YOU-WILL-BE" line, maybe because they don't want to sign off the commit) then not only will their wish be contradicted, but also Somebody Else would be deleted. What about allowing a --draft option, which allows blank trailer values plus comments interspersed in the trailer lines? (I.e., the equivalent of --trim-empty could be the default and --draft would turn it off plus allow interspersed comments.) Then the template could be # You can add one or more Signed-off-by lines for other people here. # A Signed-off-by line for you will be appended automatically when # you commit. Signed-off-by: Or, even better: # You can add one or more Signed-off-by lines here: Signed-off-by: Signed-off-by: # You can delete the following line if you don't want it: Signed-off-by: Me <me@xxxxxxxxxxx> ; i.e., the Signed-off-by line for the author could be filled in *before* the user is asked to edit the commit message. There could also be a --cleanup mode that allows blank values and comments on input but removes them from the output. > [...] Given Git's requirements for backwards compatibility, a specification that we release now will have to be supported forever (because it will be baked into commits and can *never* be changed), and any trailer-handling tools that we release now will have to be supported for years (until at least Git 3.0). All in all, I think that there has been a lot of discussion about the interface of this one command, "git interpret-trailers", including its quite complicated configuration and a command-line behavior. And yet it seems to me that not many Git developers have been very engaged in the conversation, and Junio (who has) still doesn't seem satisfied with it. I (though among the too-little engaged) have the feeling that it is still a ways from maturity. On the contrary, the data format and semantics of the finished trailers seem to have gotten too little attention, even though they are simpler to define and even more important than the interface of the command used to manipulate trailers. I think it would be really helpful to have a careful specification of the data format, and make sure that everybody agrees on what we want. For example, I think it is crucial that the trailers can be read and written unambiguously. Once that's clear, it will be a lot easier to be sure that the tool(s) for working with trailers conform to the specification. Even then, I think it might be prudent to mark "git interpret-trailers" as "experimental" and/or put it under "contrib" rather than among the main Git commands for a couple of releases. Luckily, it is very loosely coupled to the rest of Git, so I don't see any urgency to having it in core [1]. After people have had time to experiment with it, then it could be moved to core. Michael [1] Having the script in contrib would also make it possible to implement it use a scripting language to make it easier to iterate on the design. When the details are agreed it could have been reimplemented in C. But I guess that ship has already sailed. -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html