On Thu, Oct 08, 2015 at 06:22:51PM +1100, Mark Andrews wrote: > Though I can see how you could think this was a valid strategy if > you only look at a single base64 word after encoding a single utf-16 > character. > > AAA= 0x0000 (discard 2 bits) > AAAA 0x0000 (discard 8 bits) > > Now you could safely replace all the '=' pad characters with a > single 'A' but that would just be a perverse encoder and if you > were to use such a encoder I wouldn't blame the decoder for rejecting > the input. I don't read Mark's response as saying that non-minimal padding is *invalid*. He says the encoder is "perverse", and I agree that the encoder would be better off not generating excess padding. He further says that he would not be surprised if some decoders rejected non-minimally padded input, and frankly I would also not be surprised, but that does not make the input invalid. The specification says that up to 14 (< 16) bits of zero padding is to be discarded by decoders, it does not limit the discard bit count to 4 (< 6). There are lots of lazy and fragile implementations of standards out there, encoders need to try to avoid generating non-mainstream outputs if they want most decoders to handle the result. On Thu, Oct 08, 2015 at 02:21:36PM +0300, A. Rothman wrote: > Everything else still stands. Specifically, the two replies beautifully > illustrate my point about ambiguousness - in their interpretation of the > actual test cases I submitted, one says that all inputs are valid, and > the other says some of them are invalid. That's exactly the problem I > saw when comparing libraries. Perhaps Mark really does consider 8 to 14 bits of padding as "invalid" (not just "perverse"). If so, then indeed the specification is open to multiple interpretations. As I see it, so far Mark and I are on the same page. > As a starting point, my suggestion would be that an encoder SHOULD add > the minimal amount of padding necessary, which is likely what encoders > already do, while a decoder MUST accept and discard any amount of zero > padding (less than 16 bits of course), in line with being more lenient > on inputs, and simplifying/micro-optimizing the decoder by removing an > extra check+documentation and applying KISS. It would be nice to add one > of the test cases in the errata as well, to clarify the expected result. The only thing "missing" from the specification is advice (or a requirement) to make the padding "minimal". That is to pad only to the *closest* base64 (i.e. multiple of 6 bit) boundary. -- Viktor.