On Sun, Oct 23, 2011 at 8:46 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: > >> On Sun, Oct 23, 2011 at 4:51 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> ... >>> The low level object format of our commit is textual header fields, each >>> of which is terminated with a LF, followed by a LF to mark the end of >>> header fields, and then opaque payload that can contain any bytes. It does >>> not forbid a non-Git application to reuse the object store infrastructure >>> to store ASN.1 binary goo there, and the low level interface we give such >>> as cat-file is a perfectly valid way to inspect such a "commit" object. >> >> cat-file is fine, commit-tree (or any commands that call >> commit_tree()) cuts at NUL though. >> I wonder how git processes commit messages in utf-16. > > That is exactly what I am saying. > > Perhaps you didn't either read or understand what you omitted from your > quoting; otherwise you even wouldn't have brought up utf-16. > > Let me requote that part for you. > >> But when it comes to "Git" Porcelains (e.g. the log family of commands), >> we do assume people do not store random binary byte sequences in commits, >> and we do take advantage of that assumption by splitting each "line" at >> LF, indenting them with 4 spaces, etc. In other words, a commit log in the >> Git context _is_ pretty much text and not arbitrary byte sequence. > > Think what would cutting at a byte whose value is 012 and adding four > bytes whose values are 040 to each of "lines" that formed with such > cutting do to UTF-16 goo, even if it does not contain any NUL byte. As far > as Git Porcelains are concerned, it is no different from random binary > byte sequences. > I'm sorry. The utf-16 was an afterthought when I was nearly finished with the reply and already cut that quote. The assumption that people do not store random binary byte sequences in commits sort of conflicts with "encoding" field in the commit header though. The assumption is documented in i18n.txt. I guess it's just me who did not read document carefully. But maybe it's good to stop people from shooting themselves in this case (i.e. setting encoding to utf-16 or similar). -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html