In message <168699907.511444808763309.JavaMail.root@shefa>, "A. Rothman" writes: > > I'm not sure what the debate was - are you referring to having multiple > encoded representations of the same unicode character sequence? If so, > then the question of padding is moot since it's explicitly allowed to > have a whole lot of different representations in the UTF-7 encoding > (with or without an explicit trailing '-' to end a shift sequence, the > explicit optional set that may or may not be represented in a shift > sequence, any character of the direct set that may also be in a shift > sequence, whether to combine consecutive shift sequences into one or > not, etc.). There's an exponential number of valid encodings for a > character sequence. > > Requiring the encoder to always use the minimal possible encoding is > quite a big change to the spec. Requiring a decoder to reject > non-minimal encodings would increase decoder complexity significantly. Absolute garbage unless you think getting the length of a sequence is hard. You can decode the shifted seqence and if the byte count is not a multiple of 2 or is zero then is non-minimal. If you don't want to decode the entire shift sequence you can do the analysis on the shifted characters. If the length of the shifted sequence * 3 / 4 is not even or zero it is non-minimal. A 1 * 3 / 4 -> 0 AA 2 * 3 / 4 -> 1 AAA 3 * 3 / 4 -> 2 (minimal) (2 zero bits) AAAA 4 * 3 / 4 -> 3 AAAAA 5 * 3 / 4 -> 3 AAAAAA 6 * 3 / 4 -> 4 (minimal) (4 zero bits) AAAAAAA 7 * 3 / 4 -> 5 AAAAAAAA 8 * 3 / 4 -> 6 (minimal) The (length * 3 / 4) % 6 / 2 defines what values are legal in the last shifted character. 0 all base64 characters are legal 1 last 2 bits are zero (base64[c - base64[0]] & 0x3) == 0 2 last 4 bits are zero (base64[c - base64[0]] & 0xf) == 0 > Changing the behavior regarding padding bits is meaningless without > these two additional requirements. > > That said, it would certainly be useful to add a warning in the security > considerations section about such issues, and recommend that all data be > decoded first before performing any validity checks, comparisons, etc. > (as detailed in RFC 3629 for UTF-8). > > Is this what you meant? > > On 10/14/2015 09:44 AM, Harald Alvestrand wrote: > > On 10/13/2015 11:11 PM, A. Rothman wrote: > >> Ok, thanks for your analysis and for looking into this (Mark as well). > >> > >> I shall change my decoder implementation to the lenient interpretation, > >> adjust my unit tests, and hope it is considered RFC-compliant by > >> everyone :-) > > Note that this is a reprise of the UTF-8 "overlong encoding" debate, > > where we ended up banning overlong encodings because of the security > > issues it posed (see the UTF-8 RFC for more details on the security > > issues found). > > > >> Amichai > >> > >> On 10/09/2015 08:08 AM, Viktor Dukhovni wrote: > >>> On Thu, Oct 08, 2015 at 09:40:25PM +0300, A. Rothman wrote: > >>> > >>>> Just in case someone missed it (I almost did): Mark added his own > >>>> detailed comments on the test cases, but they got buried within a long > >>>> quote from my original email so may have gone unnoticed. To recap, here > >>>> are the two interpretations: > >>>> > >>>> +A- empty + 6 (unnecessary) padding bits > >>>> +AA- empty + 12 (unnecessary) padding bits > >>>> +AAA- \U+0000, and 2 (required) padding bits > >>>> +AAAA- \U+0000, and 8 (6 extra) padding bits > >>>> +AAAAA- \U+0000, and 14 (12 extra) padding bits > >>>> +AAAAAA- \U+0000\U+0000, and 4 (required) padding bits > >>>> +AAAAAAA- \U+0000\U+0000, and 10 (6 extra) padding bits > >>>> > >>>> > >>>> +A- illegal !modified base64 > >>>> +AA- illegal !a multiple of 16 bits in modified base64 > >>>> +AAA- legal 0x0000 (last 2 bits zero) > >>>> +AAAA- illegal !a multiple of 16 bits in modified base64 > >>>> +AAAAA- illegal !modified base64 > >>>> +AAAAAA- legal 0x0000, 0x0000 (last 4 bits zero) > >>>> +AAAAAAA- illegal !a multiple of 16 bits in modified base64 > >>>> > >>>> > >>>> Does anyone else want to vote or comment on the two interpretations above? > >>> Thanks for pointing this out more clearly. Yes, they disagree. > >>> However, the manner in which they disagree is rather simple. > >>> > >>> They agree in all the cases where the padding is *minimal*. > >>> > >>> The first variant always tolerates non-minimal padding allowing > >>> anything less than 16 bits per the specification. The second > >>> variant never tolerates non-minimal padding, because there's no > >>> need to produce it. > >>> > >>> It is clear that clients should produce minimal padding, and we > >>> seem to disgree on wether to apply Postel's principle to the > >>> decoder or not. > >>> > >>> This is not a major disagreement, such differences of interpretation > >>> are endemic whether the standard is clear or not. Many implementors > >>> are lazy, and stop writing code when the expected cases work. > >>> > >>> While this is no excuse for ambiguous specifications, in this case > >>> I don't think a revision is warranted. Encoders that generate > >>> sensibly minimal padding will not run into any friction with > >>> non-broken decoders. Encoders that get creative might find that > >>> some decoders object whether the standard allows their creativity > >>> or not. > >>> > >> > > > > -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@xxxxxxx