Hi.I am sending this to both the Apache httpd and Tomcat users lists, in the hope that because together these HTTP servers cover a good fraction of the market, there might be a chance to reach the righ people.
My hope is that someone who is aware of, and connected to, the process of RFC generation would pick this up, or else inform us if some process in the direction that I am indicating below is already under way.
I apologise in advance if I am crashing an open door. If so, I would gladly accept to be informed about what the state of affairs is. (A Google search on the terms "HTTP" and "RFC" and "UTF-8" does not seem to yeld any relevant results.)
Proposal :It is becoming urgent to create a new HTTP standard/version/revision, that would be organised around Unicode as a default character set, and UTF-8 as a default encoding.
I believe that the spread and acceptance of Unicode and UTF-8 is now sufficient to warrant such an evolution.
The current situation, where iso-8859-1 is the default in some areas, and some other areas are either unspecified or vague, creates a lot of confusion and inefficiencies, and creates barriers to the creation of truly international HTTP-based WWW applications.
Here are some areas where these problems appear : - the encoding of URLs. - the encoding of HTTP headers.- the encoding of user credentials in browser-side Basic and Digest authentication dialogs, and their transmission to the server. - the encoding of input elements from html forms, as transmitted by a client to a server, and the interpretation of ditto data by the server
I am quite sure that I am forgetting some aspects of the same issue.For each of the above, there are areas where there is no specification, or areas where there are vague specifications, or areas where there are multiple apparently-contradictory specifications. Consequently, there is a profusion of ad-hoc tricks and receipes, and there start to appear various "parameters" and "flags" and "settings" at the client and server level, which may help resolving the issues in some cases, but which in the long term create even more confusion and problems of interoperability.
(example of a setting : "use body encoding for URL").There might be some efforts under way to tackle one or the other aspect of the above (I have heard of a proposal regarding HTTP headers), but I honestly believe that this issue can only be resolved well "at the top", which seems to me the HTTP protocol itself.
Thanks --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx