Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: > We could allocate just one block with length as the first field: > > struct commit_buffer { > unsigned long len; > char buf[FLEX_ARRAY]; > }; > > The downside is commit_buffer field type in struct commit changes, > which impacts many codepaths. I think that is a good thing overall to _force_ us to audit all the code, *if* our goal were to avoid losing bytes. And the solution above is better than adding a length field to "struct commit". It certainly is better than quoting NUL byte to ^@, keep using the "char *" field and risking some codepaths forget to convert it back to NUL. For types of payloads for which losing everything after the first NUL matters, converting NUL to ^@ and then forgetting to convert it back to NUL is equally bad breakage to the payload anyway, so such a conversion would not be a particularly good approach to avoid losing bytes. But as Jeff suggested, we should step back a bit and think what our goal is. The low level object format of our commit is textual header fields, each of which is terminated with a LF, followed by a LF to mark the end of header fields, and then opaque payload that can contain any bytes. It does not forbid a non-Git application to reuse the object store infrastructure to store ASN.1 binary goo there, and the low level interface we give such as cat-file is a perfectly valid way to inspect such a "commit" object. But when it comes to "Git" Porcelains (e.g. the log family of commands), we do assume people do not store random binary byte sequences in commits, and we do take advantage of that assumption by splitting each "line" at LF, indenting them with 4 spaces, etc. In other words, a commit log in the Git context _is_ pretty much text and not arbitrary byte sequence. Even the "--pretty=raw" option for "log" family is not about the "raw" body; the "raw"-ness applies only to the header fields. So even if we _were_ to update the codepaths involved to avoid losing bytes, the end result will not be useful for users to whom ability to include NUL matters. So in that sense, I do not think it is unreasonable to chop it off at the first NUL, which is the current behaviour. IOW, it is entirely sane to argue that there is nothing to fix. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html