Re: [PATCH 1/2] commit: reject invalid UTF-8 codepoints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 29, 2013 at 07:13:40PM -0700, Junio C Hamano wrote:
> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
> Does this correspond to the following comment in the same file, and
> if so, shouldn't this part of your patch?

Yes, yes, it should.

> We used to allow te original up-to-6 form and this update is about
> rejecting anything above U+10FFFF (in line with e.g. RFC 3629)?

Correct.  Since no codepoints above U+10FFFF will ever be assigned,
there's no point in trying to allow those UTF-8 sequences, since they
will be caught below anyway.  Some four-byte sequences will produce
too-large codepoints, but all five- and six-byte sequences are
guaranteed to.

> As you use separate if statements for this check, perhaps you can
> give each of them a better comment to say what each is rejecting?
> E.g.
> 
> 	/* do not allow range for surrogate pair */
> 		if (codepoint >= 0xd800 && codepoint <= 0xdfff)
> 			return bad_offset;

Sure.

> As that comment I quoted at the beginning says, we did not check for
> invalid encoded values and the primary reason for it is beccause
> this function did not want to look into the actual values here.  But
> now you are looking into "codepoint", you can now also check for
> "overlong" form (e.g. sequence "C0 80" turning into U+0000)?

Correct.  That's what my second patch does.  I thought I would make the
changes separately, since they're slightly different and from what I've
seen git development prefers small, independent patches, but if you'd
prefer, I can squash them into one patch.

> Perhaps
> 
>         test_expect_success 'test name' '
>                 test_when_finished "rm -f $HOME/stderr" &&

I wasn't aware that existed, but it makes sense to use it.

> > +Invalid surrogate:???
> 
> I suspect that I did not receive what you intended to send.  Do you
> want to send this part as a binary patch perhaps?

If you ended up with an encoding of U+D800, then you got it.  Otherwise,
I can resend it as a binary patch during the reroll.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]