Re: what is the charset of a URL ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 7 Feb 2009, at 21:30, André Warnier wrote:

Hi.

I have been wondering for a while about how a server application should really consider the "query string" part of a URL, in terms of character encoding. I am talking here of a URL of the form
http://hostname/somepath?name1=value1&name2=value2..&nameN=valueN
(the part after the question mark)

This question crops up on apache lists from time to time - check archives.
Basically:
- it's underspecified in the specs - hence the need for your question.
  - in practice, in the case of HTML forms and form submissions,
    browsers will use the charset of the form.  But that's empirical,
    and could break down if a browser doesn't support a charset.
  - there are various standards (e.g. HTML which you cite, and XML)
    that say something on the subject.  But if you generalise any
    one of them, it'll conflict with another.

That's all nice and well, but

a) if this incoming URL is something typed by a user in the URL bar of the browser, there is no such previous response sent by the server.

A user typing thusly is interacting on his own terms with your application.
It's up to them to be compatible - whatever that is.

b) HTTP being a connection-less protocol, the server should anyway not have any recollection that it has previously sent such a form to the same browser (yesterday ?), so when a request comes in, the server doesn't know any of these things above for sure

But it need only be designed to work with its own pages,

c) the browser may decide to do whatever it pleases and disregard what the server told it (IE comes to mind, practical examples on request).

I haven't heard of IE screwing up charsets in HTML forms.  But ICBW.

It should then be in violation of the specifications, but considering the above I'm not so sure it is clear-cut.

For a while now, I have resorted to do all the things above, and in addition to always sending forms specifying "enctype=multipart/form- data", for which the problem should not exist.

Um, it just moves to the charset of the form parts!

--
Nick Kew
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
  "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx



[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux