Re: Strange characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 11 May 2005 07:43, Carl Furst wrote:
> I have a question about an odd phenomenon. It doesn't have much to do with
> PHP except that I used strtr to solve it, and it maybe that the problem is
> being caused by a setting in PHP, but I would like to get some more
> background info as to why this is happening.
>
>
>
> On a typical Windows system, most applications use the windows-1252
> character set. Linux uses UTF-8 or Unicode. The former being an 8 bit set
> and the latter being a 16 bit set.
>
>
>
> Well I have a form on a website that has to be able to take in text from
> MSWord and Notepad and the like. If someone has been using "Autoformating"
> in MS Word, the "special characters" get translated into a UTF-8
> equivalent. What's odd is that these 8 bit windows characters become 24 bit
> combinations, I think. When I look at the characters in hex they are
> represented by 3 numbers first one always being 0xE2.
>
>
>
> Why is there an 0xE2 beginning the character combination and why does PHP
> translate these characters this way? Is there something you can do to
> minimize them besides writing some kind of character scrubber?

If you check the UTF8 character set table at (http://www.unicode.org/charts/) 
you will see that the section for Basic Latin answers your question.

>
>
>
> Thanks,
>
> Carl

-- 

Cyberly yours,
Petar Nedyalkov
Devoted Orbitel Fan :-)

PGP ID: 7AE45436
PGP Public Key: http://bu.orbitel.bg/pgp/bu.asc
PGP Fingerprint: 7923 8D52 B145 02E8 6F63 8BDA 2D3F 7C0B 7AE4 5436

Attachment: pgpGGAb1x86ZR.pgp
Description: PGP signature


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux