jeanpca@xxxxxxx wrote: > > I work on a fc4 [2.6.11-1.1369_FC4] and my system is speaking in english > My i18n file looks like to > cat /etc/sysconfig/i18n > LANG="en_US.UTF-8" ^^^^^ (which will only work if you're reading this in a fixed font...) OK. This e-mail is written in what's known as "US-ASCII". "US-ASCII" only supports the characters on an American keyboard. It uses character values up to 127. Each character is usually stored in an eight-bit byte (or octet), which can store values up to 255. Then people started wanting to use accents ... and Greek letters ... and Russian letters .. and all sorts of other symbols. So they created ways to use those other values up to 255. Unfortunately, there were way more than 128 different characters that different nationalities wanted to use. So we ended up with dozens of ways of extending ASCII. The ISO 8859-1 variant was most popular for Western Europena languages -- until the Euro symbol was created. And it still wasn't possible to include Greek and Russian in the same document. And Chinese and Japanese users had to have their own standards anyway -- they have thousands of different characters. So another standard was created -- Unicode. Unicode was originally encoded as *two* octets -- with up to 65536 different characters. That turned out to be (a) not enough for all the world's different languages, and (b) rather complex to handle. UTF-8 is a different way of encoding Unicode. US-ASCII letters are encoded as one octet, just as they always have been. Accented letters, and letters from other character sets, take up between two and four characters. And there is the promise of one standard for the whole world, and everything being sweetness and light, and that anything that can be written can be shown on any computer screen around the world. In practice, UTF-8 is about as good as you can get. > On this system, when I create an accentued char from my keyboard, it is > written in two words: Technical niggle -- "word" has a separate, different, technical meaning in this context. I think you mean "byte" or "octet". That's a two-octet UTF-8 character. > 0000000 303 251 e e e \n > If i display this file my web server or send it by mail (php), i get some > strange chars OK -- in this case you *need* to read up about MIME encoding and content-type and charset headers. These are needed in any case for your viewers / recipients to be able to understand accents, whether you send them as a traditional ISO-8859 encoding or as UTF-8. Because some, but not all, of your recipients will understand them the way you meant them. Others will use different character sets as standard and see something completely different. They might have a Greek letter at the same "code point". You need some way of convincing your recipients' computers that you are sending data in *this* particular character encoding. And once you've got that working, you might just as well go for the UTF-8 standard and be able to send and receive in all sorts of different languages. And MIME encodings are the way to do this. Hope this helps, James. -- E-mail: james@ | [Bradford Cathedral] took 194 years to complete. A aprilcottage.co.uk | construction period of nearly two centuries may seem | ridiculous to us, but of course builders were a lot | quicker in those days. -- "ISIHAC", BBC Radio 4