Re: base64

David Wilson <David.Wilson@isode.com> · 24 Sep 2003 19:24:29 +0100

On Tue, 2003-09-23 at 19:10, Lothar Kimmeringer wrote:
> On Tue, 23 Sep 2003 12:18:31 -0400 (EDT), Birl wrote:
> 
> >Excuse my ignorance.  I tried to pook around some B64 attachements in my
> >email files for an answer.
> >
> >
> >Are you stating that an =
> >
> >1) should not appear in B64 at all
> >2) should not appear in the middle of a line, but only at the EOLN
> >3) should only appear at the end of a B64 file
> 
> See the corresponding RFC. The number of characters in a base64-coded
> text must be a multiply of 4. So ='s are used if there aren't enough
> characters and are added at the end of the text.
> 
> = is not a valid character inside Base64 and an encoder should stop
> with an error or stops decoding.
> 
> >Answering that question could help me better determine how to write a
> >procmail filter for this.
> 
> /.*=[^=]/
> (untested)

RFC 2045 states (section 6.8):

   "The encoded output stream must be represented in lines of no more
   than 76 characters each.  All line breaks or other characters not
   found in Table 1 must be ignored by decoding software.  In base64
   data, characters other than those in Table 1, line breaks, and other
   white space probably indicate a transmission error, about which a
   warning message or even a message rejection might be appropriate
   under some circumstances."

So, characters outside the 65 character set are "ignored", but "warning"
or "rejection" might be appropriate. Nicely ambiguous. This does not
apply strictly to '=', but might taken to apply to '=' within a base64
encoding.

   "Because it is used only for padding at the end of the data, the
   occurrence of any "=" characters may be taken as evidence that the
   end of the data has been reached (without truncation in transit).  No
   such assurance is possible, however, when the number of octets
   transmitted was a multiple of three and no "=" characters are
   present."

This is also not clear. Does it mean that '=' can be taken to delimit
the data? Or does it mean no more than finding a '=' that means no data
has been lost?

So, there is ambiguity in RFC 2045, and this is the point of the
original post. Different people, and therefore different implementations
will have different interpretations. There is therefore potential for a
vulnerability when checks are performed using one interpretation but the
actual receiver uses another interpretation.

It would be nice to be able to enforce rules in email servers - there
are many ways in which messages do not conform to the standards - even
when they are unambiguous. But there are too many common email user
agents which generate non-conforming messages.

Or should we reject all these broken messages? ;-)

cheers

David Wilson                             David.Wilson@isode.com
Isode Limited                            Tel: +44 (0) 20 8783 2961
http://www.isode.com