Re: [StGit PATCH] Parse commit object header correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Catalin Marinas <catalin.marinas@xxxxxxxxx> writes:

> On 8 February 2012 07:33, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> To allow parsing the header produced by versions of Git newer than the
>> code written to parse it, all commit parsers are expected to skip unknown
>> header lines, so that newer types of header lines can be added safely.
>> The only three things that are promised are:
>>
>>  (1) the header ends with an empty line (just an LF, not "a blank line"),
>>  (2) unknown lines can be skipped, and
>>  (3) a header "field" begins with the field name, followed by a single SP
>>     followed by the value.
>
> Thanks for looking into this. Is this the same as an email header? If
> yes, we could just use the python's email.Header.decode_header()
> function (I haven't tried yet).

If you are thinking about feeding everything up to the first empty line to
whatever is designed to parse email header, please don't.  I do not think
they obey "skip unknown lines without barfing" rule [*1*], so we would be
back to square one if you did so.

The fix posted in this thread is necessary because the change to StGit
between v0.15 and v0.16 made to ignore lines starting with "encoding " was
a wrong way to work around the broken parser in v0.15 in the first place.
The parser assumed that (1) all whitespaces around the header lines can be
stripped, (2) the result after such stripping will always have at least
one whitespace so that line.split(None, 1) will never barf, and (3)
between the field name and its value there may be arbitrary number of
whitespace characters that can be ignored so that line.split(None, 1) is a
safe way to split it into a (key,value) pair.  None of which is a safe
thing to assume.  The rule for safe parsing is to ignore all lines it does
not understand without assuming anything, and I wrote the patch in this
thread to make sure it makes no such unwarranted assumption.

> BTW, does Git allow custom headers to be inserted by tools like StGit?

The header format is designed in such a way that it is safe for a parser
to silently ignore unknown cruft, but that also means tools that work on
an existing commit and produce a similar one, like "commit --amend", are
free to either ignore and drop them when creating a new commit out of the
original one, or replay it verbatim without adjusting them to the new
context they appear in.  In that sense, they are technically "allowed",
but depending on the nature of the information you are putting there, it
semantically may or may not produce the desired result [*2*].  I would say
it is strongly discouraged to invent new types of header lines without
first consulting the people who maintain tools you must interoperate with,
so that they will also be aware of them, and hopefully their tools can be
adjusted to help you use them.

Sorry for the breakage and making you to deal with this post release. We
observed that recent StGit did not barf after we added "encoding " field
to Git, and assumed that StGit correctly ignored lines that it did not
understand, without inspecting its code.

At least we should have Cc'ed you guys directly when the change was being
discussed on the list.


[Footnote]

*1* I also suspect that it will handle a line that begins with a single SP
differently if you use email parsing rules. In a commit object header, the
content of such a line is appended to the value of the previous field
after turning that leading single SP into a LF, and the resulting value
will be a string that consists of multiple lines. The header folding rule
used for e-mail in RFC2822 is a way to represent a (logically) single line
as physically multiple lines, so the result of unfolding will become a
single line.  This difference may not matter for the purpose of the
current StGit that understands nothing but tree/parent/author/committer,
but because we are discussing a forward-looking fix for its parser, I
wouldn't recommend "it does not matter because we do not currently care"
approach.

*2* For example, a line that begins with "gpgsig " is a field that records
the GPG signature of a commit itself (using "git commit -S"), and it
should not survive across "commit --amend".  A line that begins with
"mergetag " is a field that records the tag information that was merged
from a side branch, and amending such a merge does not change what was
merged, so it should survive across "commit --amend".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]