Re: [PATCH 1/2] commit: reject invalid UTF-8 codepoints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:

> On Sat, Jun 29, 2013 at 07:13:40PM -0700, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
>> Does this correspond to the following comment in the same file, and
>> if so, shouldn't this part of your patch?
>
> Yes, yes, it should.
> ...
>> As that comment I quoted at the beginning says, we did not check for
>> invalid encoded values and the primary reason for it is beccause
>> this function did not want to look into the actual values here.  But
>> now you are looking into "codepoint", you can now also check for
>> "overlong" form (e.g. sequence "C0 80" turning into U+0000)?
>
> Correct.  That's what my second patch does.

Ah, I started with that quoted comment when I saw 1/2, thinking that
you may not even have noticed that these two things are our "todo"
list, hence I thought it would be natural to do them both in the
same commit as the logic to do so is in the single place (the
function you patched).  It is not _wrong_ per-se to do this in two
steps, but I do not think it is necessary.  You *could* break this
to even smaller steps to go to the other extreme, first limiting
only upto 4-byte forms, then rejecting a half of the 4-byte form,
then rejecting the range for surrogates, and then rejecting overlong
forms, and the split may be technically logical and "one step at a
time" but that kind of granularity is probably more noisy than it is
valuable.

I think "1/2 is about rejecting a codepoint outside valid ranges and
2/2 is about rejecting a valid codepoint inside the valid range but
expressed in an invalid way" is on the borderline between the two
extremes, but it probably is on the saner side, so let's keep these
two patches separate.

> If you ended up with an encoding of U+D800, then you got it.

I think we are seeing ? U+003F in our mailboxes.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]