[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MS Outlook mail creating characters not appearing (possible solution)



>> I have a couple of emails that were generated using MS
>> Outlook which contain some html entities like smart quotes
>> and the funny "-" character which just appear as "?"
>> characters in the archive.
> MS Outlook has a nasty habit of mislabeling the charset of its
> messages with iso-8859-1 instead of MS's extension to it that
> contain the characters being used.

=v= Some older versions send out mail that does't specify a
charset, so many apps assume the text is ASCII (which is how the
standard works) though of course it's Windows-1252.

=v= Those particular characters in Windows-1252 violate charset
standards anyway.  Even worse, MS products such as Outlook and
Word insert these standard-violating "smart quotes" in the wrong
places.  Sometimes they're backwards (i.e. a quote will start
with a "curly close quote" and end with a "curly open quote"),
and usually an apostrophe is turned into a "curly single close
quote," which is just wrong.

=v= Someone wrote a routine that looks for these encodings and
turns them into ASCII equivalents.  You lose some fanciness, but
what good is fanciness when it's just wrong?  This has a much
higher probability of turning out correctly than translating
them into iso-8859-1 or UTF-8 (or even HTML entities).  The
code is called "demoroniser" and is available in Perl:

http://www.fourmilab.ch/webtools/demoroniser/

It has been widely ported.  For example, it's in CPAN's
TextToHTML Perl module and is part of Macromedia's ColdFusion
web product.
    <_Jym_>


[Index of Archives]     [Bugtraq]     [Yosemite News]     [Mhonarc Home]