Re: [PATCH 00/22] Refactor to accept NUL in commit messages

Junio C Hamano <gitster@xxxxxxxxx> · Sat, 22 Oct 2011 22:51:27 -0700

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:

> We could allocate just one block with length as the first field:
>
> struct commit_buffer {
>         unsigned long len;
>         char buf[FLEX_ARRAY];
> };
>
> The downside is commit_buffer field type in struct commit changes,
> which impacts many codepaths.

I think that is a good thing overall to _force_ us to audit all the code,
*if* our goal were to avoid losing bytes. And the solution above is better
than adding a length field to "struct commit". It certainly is better than
quoting NUL byte to ^@, keep using the "char *" field and risking some
codepaths forget to convert it back to NUL. For types of payloads for
which losing everything after the first NUL matters, converting NUL to ^@
and then forgetting to convert it back to NUL is equally bad breakage to
the payload anyway, so such a conversion would not be a particularly good
approach to avoid losing bytes.

But as Jeff suggested, we should step back a bit and think what our goal
is.

The low level object format of our commit is textual header fields, each
of which is terminated with a LF, followed by a LF to mark the end of
header fields, and then opaque payload that can contain any bytes. It does
not forbid a non-Git application to reuse the object store infrastructure
to store ASN.1 binary goo there, and the low level interface we give such
as cat-file is a perfectly valid way to inspect such a "commit" object.

But when it comes to "Git" Porcelains (e.g. the log family of commands),
we do assume people do not store random binary byte sequences in commits,
and we do take advantage of that assumption by splitting each "line" at
LF, indenting them with 4 spaces, etc. In other words, a commit log in the
Git context _is_ pretty much text and not arbitrary byte sequence. Even
the "--pretty=raw" option for "log" family is not about the "raw" body;
the "raw"-ness applies only to the header fields. So even if we _were_ to
update the codepaths involved to avoid losing bytes, the end result will
not be useful for users to whom ability to include NUL matters.

So in that sense, I do not think it is unreasonable to chop it off at the
first NUL, which is the current behaviour. IOW, it is entirely sane to
argue that there is nothing to fix.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html