tl;dr: This patch series wants to introduce a permanent new Git data format. The current version can write trailers in formats that it is incapable of reading, which I consider broken. I advocate a stricter specification of the format of trailers, at least until we get feedback from users that they need more flexibility. On 05/25/2014 10:37 AM, Christian Couder wrote: > From: Michael Haggerty <mhagger@xxxxxxxxxxxx> > [...] >> * A way for users to add information to commits for whatever purpose >> they want, without having to convince upstream to built support in. > > Yeah, I agree this is the main purpose of trailers. > >> * A standard format for that information, so that all tools can agree >> how to read/write trailers without being confused by or breaking >> trailers that they didn't know about in advance. > > Yeah, but don't you think this goal can sometimes go against the > previous goal? > > I mean, if some users for their project think that it's better, for > example, if they use trailers like "Fix #42" instead of "Fix: 42", > because their bug tracking system supports "Fix #42" better, we should > let them do what suits them better, even if Git supports them not as > well as if they used "Fix: 42". The flexibility that comes from offering our users a more-or-less general key/value store already accomplishes the first goal. With that the users *could* store their data as "Fix: 42" or "Fixes: #42" and satisfy their functional requirements. Giving them the option to use "Fix #42" doesn't make *any* new functionality possible. It is pure eye-candy. And it would come at the IMO high cost of making it harder for *everybody* to work with the metadata. It makes the specification more complicated. It makes the code more complicated. It makes the configuration more complicated. It makes it more likely that there will be "false positives" (text in a commit message that our code recognizes as key/value data even though it was not meant to be). And in my opinion it makes the k/v data itself harder for a human to read because it is not in a uniform format. The only justification I can think of for allowing more flexible formats would be to "retroactively" support metadata that people already have in their history. Are there "famous" or "important" existing metadata formats that are incompatible with "Key: Value"? More to the point: do you have a concrete reason for wanting to support alternative formats like "Fix #42", or is it based more on the feeling that users will want it? Remember, it would be really easy to release v1 of this feature with a strict format, then wait and see if users clamor for more flexibility. We can always add more flexibility later. Whereas if v1 already supports more flexible formats, we pretty much have to support them forever. >> * A format that is straightforward enough that it can be machine- >> readable with minimum ambiguity. > > Yeah, but again this could go against the main purpose of trailers > above. No, the users have all the flexibility they need if they can choose their own key/value schema. Allowing alternative formats adds very little. I feel strongly that it would be a bad mistake to leave the specification of trailers so loose that they cannot be machine-readable with a good degree of confidence. Tools that add trailers should be composable. The current scheme is not. For example, suppose one tool wants to add a "Fix #42" line and another one wants to add "Signed-off-by": git config trailer.fix.key "Fix #" git config trailer.sign.key "Signed-off-by: " git config trailer.sign.ifexists doNothing echo "subject Signed-off-by: Alice <alice@xxxxxxxxxxx>" | git interpret-trailers --trailer fix=42 | git interpret-trailers --trailer sign="Bob <bob@xxxxxxxxxxx>" --------- output ------------------------------------------ subject Signed-off-by: Alice <alice@xxxxxxxxxxx> Fix #42 Signed-off-by: Bob <bob@xxxxxxxxxxx> ----------------------------------------------------------- The result is that the trailers end up not in one block but in two (meaning that the first block is no longer recognized as a trailer block), and the second "Signed-off-by" line, which should have been omitted because of ifexists=doNothing, was incorrectly added. Or let's do something like the "commit template" example from the documentation, but using "Fix #" instead of "Fixes: ": echo "***subject*** ***message*** Fix # Cc: Reviewed-by: Signed-off-by: " | sed -Ee 's/(Reviewed-by.*)/\1me/' | git interpret-trailers --trim-empty --trailer "git-version: foo" ---------- output ------------------------------------------ ***subject*** ***message*** Fix # Cc: Reviewed-by: me Signed-off-by: git-version: foo ------------------------------------------------------------ Not only haven't the empty lines been stripped off, but a new trailer block has been created for "git-version". I consider this broken. > [...] >> For example the specification >> might look something like: >> >> A commit message can optionally end with a block of trailers. >> The trailers, if present, must be separated from the rest of the >> commit message by one or more blank lines (lines that contain only >> whitespace). There must be no blank lines within the list of >> trailers. It is allowed to have blank lines after the trailers. >> >> Each trailer line must match the following Perl regular >> expression: >> >> ^([A-Za-z0-9_-]+)\s*:\s*(.*[^\s])\s*$ >> >> The string matching the first group is called the key and the string >> matching the second is called the value. Keys are considered to be >> case-insensitive [or should they be case-sensitive?]. The >> interpretation of values is left entirely up to the application. >> Values must not be empty. > > I tried to be clearer in the v12 I just posted, and I think it should > be enough to be very clear. We might want to tweak a little the > specifications later, so being too strict might be counter productive. > > And as other tools might already use trailers in a case-sensitive way > and yet other tools in a case-insensitive way, I am not sure we would > gain anything by specifying if keys or values should be interpreted in > a case-sensitive or case-insensitive way. On the contrary we might > upset people already using some of these tools for no good reason. No sane tool would interpret a trailer differently depending on how the key is capitalized. So I think that if we ourselves treat the keys case-insensitively but we preserve case when processing metadata, there won't be any problems. >> However, in --draft and --cleanup modes, empty values *are* >> allowed, as are comments (lines starting with `core.commentchar`) >> within the trailer block. In --draft mode such lines are passed >> through unchanged, and in --cleanup mode such lines are removed. > > I am not sure we should use modes. I think options like > "--trim-empty", "--allow-comments", "--allow-empty" might be clearer. I think we want to train users to think of trailers-with-no-values as temporary helpers that shouldn't end up in commits, just like the "#"-commented lines in commit message templates. Why? Because you can't rely on their being preserved. As soon as your trailer goes through a "--cleanup" (which might be there because of an unrelated tool) it will disappear. For the user it is a simpler mental model to think of modes: as long as I am in draft mode, there might be comment lines and trailer templates in my commit message, but when I commit they will go away. I would go so far as to say that deleting trailer lines with no values should be a standard part of cleaning commit messages and should maybe be an option offered by "git stripspace". > [...] >> So my preference would be to make the format of trailers really strict; >> for example, only allowing colon separators as in the regexp above. >> People who want to work with trailers using Git tools will just have to >> conform to this format. > > I don't think we should cast in stone the format for trailers, because > of the main purpose of trailers. > > The format of the commit header for example is cast in stone, but > that's ok because it is mostly for Git internal use. Trailers are > mostly for external use by users who already have tools expecting > different formats. > > There are already users who are not happy that they cannot easily have > other commit headers, and we point them to trailers. If we specify > trailers too strictly, where will we point them to? Nobody is disagreeing that users should be allowed to choose their own keys and values and assign their own interpretations to them. If we let users add their own commit headers, we certainly wouldn't let them define a header formatted like "Fix #42", would we? Again, the *only* justification for more flexible formats would be if there are a lot of tools that *already* exist out there that don't conform to "Key: value". Are there? New tools will certainly be written to use whatever format we define, and in exchange they will be able to use your awesome new tool and hopefully other upcoming tools for dealing with trailers. >> But if we must support flexibility in the separator characters, then I >> think it is important that we be able to read whatever we can write. > > An option like --input-separator might be enough to support this. This would not be adequate because a single commit message might have multiple trailers with different formats: Signed-off-by: me Fix #42 Either the input separator would have to be specified for every single trailer (which is impractical because you can't dictate centrally how git clients are configured) or the parsing code would have to be taught to read any allowed format. >> For me this means: >> >> * Enumerating a list of allowed separators (e.g., [:=#]) > > Junio suggested in a message that users might use different separators > like '%'. The fewer and stricter the better, to avoid false positive matches to non-trailer text. >> * Specifying how it is decided what separator to use when generating >> new trailers > > This is already possible with the 'trailer.<token>.key' config > variable. This means that if one developer has forgotten to configure the "Fix" trailer in one clone, then he will generate trailers that are considered malformed by a colleague who has configured "Fix #". >> * Specifying what appends when a trailer is read and then written again: >> is its separator preserved, or is the trailer converted to use the >> separator configured for that particular key in the config. And if >> the latter, what happens if a key's separator is not configured? > > Right now we only accept ':' as input separator for the messages and > ':' and '=' for the --trailer option, and the default output separator > is ':'. If the user specify a different separator in a key, this > separator will be used only in the output for this key. I consider trailers that can only be written but not read to be broken. My questions are to probe one possible alternate reality, namely: "Suppose we want to allow alternative trailer formats. What would it take to make them readable *and* writable?" For example: * What if I try to add a Signed-off-by trailer to a message that already contains "Fix #42"? In what form should the "Fix #42" line be written to the output * if I don't have the separator for "Fix" configured? * if I have the separator for "Fix" configured to be "="? * What if I *do* have the separator for "Fix" set to " #", but the input contains "Fix: 42"? How should that line be formatted on output? Does the answer change if I have "token.fix.ifexist" set to "overwrite" and run "git interpret-trailers fix=43"? By asking these questions, I hope to hint that supporting alternative formats increases the complexity of the specification and the implementation to an unjustifiable extent and/or it unrealistically relies all developers on a project having their trailer configuration set up correctly. > If this is not clear in the documentation, please susggest specific > improvements. > [...] > I tried to be very clear in the doc in v12. The doc only covers writing new trailers in "alternative" formats, not reading them. > [...] >> Given Git's requirements for backwards compatibility, a specification >> that we release now will have to be supported forever (because it will >> be baked into commits and can *never* be changed), and any >> trailer-handling tools that we release now will have to be supported for >> years (until at least Git 3.0). > > Yeah, I know that. So if we are too strict in the specification will > be stuck for a long time. No, that's exactly backwards! We can easily loosen the format in the future. Projects don't need to output trailers in the looser format until everybody involved is using the new Git version. But the format can never be made *more* strict, because then the metadata that users have added to their history will stop being readable. >> All in all, I think that there has been a lot of discussion about the >> interface of this one command, "git interpret-trailers", including its >> quite complicated configuration and a command-line behavior. And yet it >> seems to me that not many Git developers have been very engaged in the >> conversation, and Junio (who has) still doesn't seem satisfied with it. >> I (though among the too-little engaged) have the feeling that it is >> still a ways from maturity. > > My opinion is that many Git developers have been engaged and you can > see that in the Cc. > > I cannot tell if they are all very happy or not but I suppose that if > they were very unhappy they would tell it. > [...] It was unfair of me to try to characterize the opinions of other developers. On the other hand, even though many people have commented on this proposal over its long lifetime, I didn't get the feeling that it has won a consensus of +1s in its current form. I'd love to hear the opinion of others because maybe I'm totally out in left field. And I want to reiterate that the reason I'm so emphatic about this proposal is because I think it will be such a great new feature. I just think that some tweaks would make it a more solid foundation for building even more functionality onto. Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html