Catalin Marinas <catalin.marinas@xxxxxxxxx> writes: > On 8 February 2012 07:33, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> To allow parsing the header produced by versions of Git newer than the >> code written to parse it, all commit parsers are expected to skip unknown >> header lines, so that newer types of header lines can be added safely. >> The only three things that are promised are: >> >> (1) the header ends with an empty line (just an LF, not "a blank line"), >> (2) unknown lines can be skipped, and >> (3) a header "field" begins with the field name, followed by a single SP >> followed by the value. > > Thanks for looking into this. Is this the same as an email header? If > yes, we could just use the python's email.Header.decode_header() > function (I haven't tried yet). If you are thinking about feeding everything up to the first empty line to whatever is designed to parse email header, please don't. I do not think they obey "skip unknown lines without barfing" rule [*1*], so we would be back to square one if you did so. The fix posted in this thread is necessary because the change to StGit between v0.15 and v0.16 made to ignore lines starting with "encoding " was a wrong way to work around the broken parser in v0.15 in the first place. The parser assumed that (1) all whitespaces around the header lines can be stripped, (2) the result after such stripping will always have at least one whitespace so that line.split(None, 1) will never barf, and (3) between the field name and its value there may be arbitrary number of whitespace characters that can be ignored so that line.split(None, 1) is a safe way to split it into a (key,value) pair. None of which is a safe thing to assume. The rule for safe parsing is to ignore all lines it does not understand without assuming anything, and I wrote the patch in this thread to make sure it makes no such unwarranted assumption. > BTW, does Git allow custom headers to be inserted by tools like StGit? The header format is designed in such a way that it is safe for a parser to silently ignore unknown cruft, but that also means tools that work on an existing commit and produce a similar one, like "commit --amend", are free to either ignore and drop them when creating a new commit out of the original one, or replay it verbatim without adjusting them to the new context they appear in. In that sense, they are technically "allowed", but depending on the nature of the information you are putting there, it semantically may or may not produce the desired result [*2*]. I would say it is strongly discouraged to invent new types of header lines without first consulting the people who maintain tools you must interoperate with, so that they will also be aware of them, and hopefully their tools can be adjusted to help you use them. Sorry for the breakage and making you to deal with this post release. We observed that recent StGit did not barf after we added "encoding " field to Git, and assumed that StGit correctly ignored lines that it did not understand, without inspecting its code. At least we should have Cc'ed you guys directly when the change was being discussed on the list. [Footnote] *1* I also suspect that it will handle a line that begins with a single SP differently if you use email parsing rules. In a commit object header, the content of such a line is appended to the value of the previous field after turning that leading single SP into a LF, and the resulting value will be a string that consists of multiple lines. The header folding rule used for e-mail in RFC2822 is a way to represent a (logically) single line as physically multiple lines, so the result of unfolding will become a single line. This difference may not matter for the purpose of the current StGit that understands nothing but tree/parent/author/committer, but because we are discussing a forward-looking fix for its parser, I wouldn't recommend "it does not matter because we do not currently care" approach. *2* For example, a line that begins with "gpgsig " is a field that records the GPG signature of a commit itself (using "git commit -S"), and it should not survive across "commit --amend". A line that begins with "mergetag " is a field that records the tag information that was merged from a side branch, and amending such a merge does not change what was merged, so it should survive across "commit --amend". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html