Just in case someone missed it (I almost did): Mark added his own detailed comments on the test cases, but they got buried within a long quote from my original email so may have gone unnoticed. To recap, here are the two interpretations: +A- empty + 6 (unnecessary) padding bits +AA- empty + 12 (unnecessary) padding bits +AAA- \U+0000, and 2 (required) padding bits +AAAA- \U+0000, and 8 (6 extra) padding bits +AAAAA- \U+0000, and 14 (12 extra) padding bits +AAAAAA- \U+0000\U+0000, and 4 (required) padding bits +AAAAAAA- \U+0000\U+0000, and 10 (6 extra) padding bits +A- illegal !modified base64 +AA- illegal !a multiple of 16 bits in modified base64 +AAA- legal 0x0000 (last 2 bits zero) +AAAA- illegal !a multiple of 16 bits in modified base64 +AAAAA- illegal !modified base64 +AAAAAA- legal 0x0000, 0x0000 (last 4 bits zero) +AAAAAAA- illegal !a multiple of 16 bits in modified base64 Does anyone else want to vote or comment on the two interpretations above? On 10/08/2015 07:06 PM, Viktor Dukhovni wrote: > On Thu, Oct 08, 2015 at 06:22:51PM +1100, Mark Andrews wrote: > >> Though I can see how you could think this was a valid strategy if >> you only look at a single base64 word after encoding a single utf-16 >> character. >> >> AAA= 0x0000 (discard 2 bits) >> AAAA 0x0000 (discard 8 bits) >> >> Now you could safely replace all the '=' pad characters with a >> single 'A' but that would just be a perverse encoder and if you >> were to use such a encoder I wouldn't blame the decoder for rejecting >> the input. > I don't read Mark's response as saying that non-minimal padding is > *invalid*. He says the encoder is "perverse", and I agree that > the encoder would be better off not generating excess padding. > > He further says that he would not be surprised if some decoders > rejected non-minimally padded input, and frankly I would also not > be surprised, but that does not make the input invalid. The > specification says that up to 14 (< 16) bits of zero padding is to > be discarded by decoders, it does not limit the discard bit count > to 4 (< 6). > > There are lots of lazy and fragile implementations of standards > out there, encoders need to try to avoid generating non-mainstream > outputs if they want most decoders to handle the result. > > On Thu, Oct 08, 2015 at 02:21:36PM +0300, A. Rothman wrote: > >> Everything else still stands. Specifically, the two replies beautifully >> illustrate my point about ambiguousness - in their interpretation of the >> actual test cases I submitted, one says that all inputs are valid, and >> the other says some of them are invalid. That's exactly the problem I >> saw when comparing libraries. > Perhaps Mark really does consider 8 to 14 bits of padding as > "invalid" (not just "perverse"). If so, then indeed the specification > is open to multiple interpretations. As I see it, so far Mark and I > are on the same page. > >> As a starting point, my suggestion would be that an encoder SHOULD add >> the minimal amount of padding necessary, which is likely what encoders >> already do, while a decoder MUST accept and discard any amount of zero >> padding (less than 16 bits of course), in line with being more lenient >> on inputs, and simplifying/micro-optimizing the decoder by removing an >> extra check+documentation and applying KISS. It would be nice to add one >> of the test cases in the errata as well, to clarify the expected result. > The only thing "missing" from the specification is advice (or a > requirement) to make the padding "minimal". That is to pad only > to the *closest* base64 (i.e. multiple of 6 bit) boundary. >