[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: MS Outlook mail creating characters not appearing (possible solution)
>> I have a couple of emails that were generated using MS
>> Outlook which contain some html entities like smart quotes
>> and the funny "-" character which just appear as "?"
>> characters in the archive.
> MS Outlook has a nasty habit of mislabeling the charset of its
> messages with iso-8859-1 instead of MS's extension to it that
> contain the characters being used.
=v= Some older versions send out mail that does't specify a
charset, so many apps assume the text is ASCII (which is how the
standard works) though of course it's Windows-1252.
=v= Those particular characters in Windows-1252 violate charset
standards anyway. Even worse, MS products such as Outlook and
Word insert these standard-violating "smart quotes" in the wrong
places. Sometimes they're backwards (i.e. a quote will start
with a "curly close quote" and end with a "curly open quote"),
and usually an apostrophe is turned into a "curly single close
quote," which is just wrong.
=v= Someone wrote a routine that looks for these encodings and
turns them into ASCII equivalents. You lose some fanciness, but
what good is fanciness when it's just wrong? This has a much
higher probability of turning out correctly than translating
them into iso-8859-1 or UTF-8 (or even HTML entities). The
code is called "demoroniser" and is available in Perl:
http://www.fourmilab.ch/webtools/demoroniser/
It has been widely ported. For example, it's in CPAN's
TextToHTML Perl module and is part of Macromedia's ColdFusion
web product.
<_Jym_>
[Index of Archives]
[Bugtraq]
[Yosemite News]
[Mhonarc Home]