Re: [PATCH resend] Do not create commits whose message contains NUL

Drew Northup <drew.northup@xxxxxxxxx> · Sun, 01 Jan 2012 11:27:31 -0500

On Tue, 2011-12-13 at 12:59 -0500, Jeff King wrote:
> It looks like we already have a check for is_utf8, and this is not
> failing that check. I guess because is_utf8 takes a NUL-terminated
> buffer, so it simply sees the truncated result (i.e., depending on
> endianness, "foo" in utf16 is something like "f\0o\0o\0", so we check
> only "f"). We could make is_utf8 take a length parameter to be more
> accurate, and then it would catch this.
> 
> However, I think that's not quite what we want. We only check is_utf8 if
> the encoding field is not set. And really, we want to reject NULs no
> matter _which_ encoding they've set, because git simply doesn't handle
> them properly.

I had already started experimenting with automatically detecting decent
UTF-16 a long while back so that compatible platforms could handle it
appropriately in terms of creating diffs and dealing with newline
munging between platforms. There is no 100% sure-fire check for UTF-16
if you don't already suspect it is possibly UTF-16. If we really want to
check for possible UTF-16 specifically I can scrape out the check I
wrote up and send it along.
The is_utf8 check was not written to detect 100% valid UTF-8 per-se. It
seems to me that it was written as part of the "is this a binary or not"
check in the add/commit path. I have thought for some time that
specifying buffer length in that whole code path would be a good idea
(but I thought that somebody else had taken up that battle while I was
busy dealing with other problems elsewhere), if for no other reason it
would force it to deal with NULs more intelligently.

-- 
-Drew Northup
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html