Re: First stupid post of the year. [SOLVED]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 4 Jan 2008 09:16:54 -0500, tedd wrote:

> At 10:33 AM +0100 1/4/08, Nisse Engström wrote:
>>On Thu, 3 Jan 2008 12:39:36 -0500, tedd wrote:
> 
> Nisse:
> 
> I thank you for your most enlightened and informative reply.
> 
> I cut/pasted your post into my list of things to remember.

A few more random thoughts on form submission:

   How does the browser know which character encoding
to use in the form submission? Well, lacking any other
guidance, I believe most browsers tend to use the
encoding that was used in the document where the form
is located.

   So what guidance can you give to the browser? The
<form> element has an attribute `accept-charset´ that
can be used to specify a list of acceptable character
encodings. However, something in the back of my mind
tells me I've read that this is not widely supported
by browsers, but I could easily be wrong about this.
In any case, a browser can choose a different encoding
if none of those specified are supported.

  [The name `accept-charset´ is somewhat unfortunate
because it confuses two different concepts: A character
set is a repertoire of characters, while a character
encoding is a way to translate (serialize) the
characters into a byte sequence. UTF-8 and UTF-16 both
contain the same character set (Unicode), but they
encode them in very different ways.]


   The general rule seems to be: Browsers tend to use
the same character encoding that it received from the
server.

   This brings up another problem: How do you *know*
which character encoding was actually used? Apparently,
this problem was overlooked when the HTTP protocol was
devised. The only way (according to the HTML spec. is
to use a POST request with enctype=multipart/form-data,
but I don't think PHP makes the Content-Type information
available to the user, so this is no help.

   Someone (Ian Hickson?) came up with a fix for this:
If you add the following form control:

      <input type=hidden name=_charset_>

most modern browsers will fill in which encoding it used
in the form submission.

  -   -   -  

More reading:

W3C on Internationalization:
    <http://www.w3.org/International/>
    (Even experts get it wrong. Spot the bug!)
W3C on character sets and encodings:
    <http://www.w3.org/International/getting-started/characters>
Wikipedia on Character encoding:
    <http://en.wikipedia.org/wiki/Character_encoding>



      "This entire encoding process is more involved
       than it looks"


/Nisse

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux