On Fri, 4 Jan 2008 09:16:54 -0500, tedd wrote: > At 10:33 AM +0100 1/4/08, Nisse Engström wrote: >>On Thu, 3 Jan 2008 12:39:36 -0500, tedd wrote: > > Nisse: > > I thank you for your most enlightened and informative reply. > > I cut/pasted your post into my list of things to remember. A few more random thoughts on form submission: How does the browser know which character encoding to use in the form submission? Well, lacking any other guidance, I believe most browsers tend to use the encoding that was used in the document where the form is located. So what guidance can you give to the browser? The <form> element has an attribute `accept-charset´ that can be used to specify a list of acceptable character encodings. However, something in the back of my mind tells me I've read that this is not widely supported by browsers, but I could easily be wrong about this. In any case, a browser can choose a different encoding if none of those specified are supported. [The name `accept-charset´ is somewhat unfortunate because it confuses two different concepts: A character set is a repertoire of characters, while a character encoding is a way to translate (serialize) the characters into a byte sequence. UTF-8 and UTF-16 both contain the same character set (Unicode), but they encode them in very different ways.] The general rule seems to be: Browsers tend to use the same character encoding that it received from the server. This brings up another problem: How do you *know* which character encoding was actually used? Apparently, this problem was overlooked when the HTTP protocol was devised. The only way (according to the HTML spec. is to use a POST request with enctype=multipart/form-data, but I don't think PHP makes the Content-Type information available to the user, so this is no help. Someone (Ian Hickson?) came up with a fix for this: If you add the following form control: <input type=hidden name=_charset_> most modern browsers will fill in which encoding it used in the form submission. - - - More reading: W3C on Internationalization: <http://www.w3.org/International/> (Even experts get it wrong. Spot the bug!) W3C on character sets and encodings: <http://www.w3.org/International/getting-started/characters> Wikipedia on Character encoding: <http://en.wikipedia.org/wiki/Character_encoding> "This entire encoding process is more involved than it looks" /Nisse -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php