Re: RFC 2152 - UTF-7 clarification

Harald Alvestrand <harald@xxxxxxxxxxxxx> · Wed, 14 Oct 2015 08:44:44 +0200

On 10/13/2015 11:11 PM, A. Rothman wrote:
> Ok, thanks for your analysis and for looking into this (Mark as well).
>
> I shall change my decoder implementation to the lenient interpretation,
> adjust my unit tests, and hope it is considered RFC-compliant by
> everyone :-)

Note that this is a reprise of the UTF-8 "overlong encoding" debate,
where we ended up banning overlong encodings because of the security
issues it posed (see the UTF-8 RFC for more details on the security
issues found).

>
> Amichai
>
> On 10/09/2015 08:08 AM, Viktor Dukhovni wrote:
>> On Thu, Oct 08, 2015 at 09:40:25PM +0300, A. Rothman wrote:
>>
>>> Just in case someone missed it (I almost did): Mark added his own
>>> detailed comments on the test cases, but they got buried within a long
>>> quote from my original email so may have gone unnoticed. To recap, here
>>> are the two interpretations:
>>>
>>> +A-             empty + 6 (unnecessary) padding bits
>>> +AA-            empty + 12 (unnecessary) padding bits
>>> +AAA-           \U+0000, and 2 (required) padding bits
>>> +AAAA-          \U+0000, and 8 (6 extra) padding bits
>>> +AAAAA-         \U+0000, and 14 (12 extra) padding bits
>>> +AAAAAA-        \U+0000\U+0000, and 4 (required) padding bits
>>> +AAAAAAA-       \U+0000\U+0000, and 10 (6 extra) padding bits
>>>
>>>
>>> +A-             illegal	!modified base64
>>> +AA-            illegal	!a multiple of 16 bits in modified base64
>>> +AAA-           legal   0x0000 (last 2 bits zero)
>>> +AAAA-          illegal !a multiple of 16 bits in modified base64
>>> +AAAAA-         illegal	!modified base64
>>> +AAAAAA-        legal   0x0000, 0x0000 (last 4 bits zero)
>>> +AAAAAAA-       illegal !a multiple of 16 bits in modified base64
>>>
>>>
>>> Does anyone else want to vote or comment on the two interpretations above?
>> Thanks for pointing this out more clearly.  Yes, they disagree.
>> However, the manner in which they disagree is rather simple.
>>
>> They agree in all the cases where the padding is *minimal*.
>>
>> The first variant always tolerates non-minimal padding allowing
>> anything less than 16 bits per the specification.  The second
>> variant never tolerates non-minimal padding, because there's no
>> need to produce it.
>>
>> It is clear that clients should produce minimal padding, and we
>> seem to disgree on  wether to apply Postel's principle to the
>> decoder or not.
>>
>> This is not a major disagreement, such differences of interpretation
>> are endemic whether the standard is clear or not.  Many implementors
>> are lazy, and stop writing code when the expected cases work.
>>
>> While this is no excuse for ambiguous specifications, in this case
>> I don't think a revision is warranted.  Encoders that generate
>> sensibly minimal padding will not run into any friction with
>> non-broken decoders.  Encoders that get creative might find that
>> some decoders object whether the standard allows their creativity
>> or not.
>>
>
>