Comments inline; You painted this situation today with an overly broad brush, there are some remaining issues but they are much narrower than you identify below... André Warnier wrote: > > It is becoming urgent to create a new HTTP standard/version/revision, > that would be organised around Unicode as a default character set, and > UTF-8 as a default encoding. The reason your search was futile is that you want to focus on searching internet-draft where there are proposals in this sphere. Also watch the dependencies of the http draft, many of those have also evolved and are beginning to solve the utf8 situation. > Here are some areas where these problems appear : > - the encoding of URLs. That is not a problem. URLs are essentially ASCII and the high order bit byte domain is undefined. So from a presentation perspective it can be a problem, but technically and operationally this is not. The only way to represent URLs in the spirit of their design is to % encode the high bit characters for presentation. They can be UTF-8 or ISO-8859-1 (not either-or, but the administrator's choice) and are easily typed in from hardcopy (e.g. the tag on a TV commercial) by anyone using any character set who has access to the ASCII subset. Using "UTF-8" alone is not enough; to accept arbitrary characters is to ignore the fact that there are multiple representations, often not entirely synonymous, from visual references which are entered by the user. It's to ignore the issue of canonical forms when we are lucky enough to have an astute reader. So % encoding is the only safe data entry format from the sensory world to the browser url bar. > - the encoding of HTTP headers. Headers? I hope you mean header values. *TEXT values clearly declare how to shift to utf-8, but there's an ongoing discussion of how to fix or broaden or clarify this on the http-wg list. > - the encoding of user credentials in browser-side Basic and Digest > authentication dialogs, and their transmission to the server. Is a side effect of the HTTP headers question, and further it's a UI design issue. > - the encoding of input elements from html forms, as transmitted by a > client to a server, and the interpretation of ditto data by the server The RFC2616 http spec is clear on this and needs no further clarification. 7.2 Entity Body The entity-body (if any) sent with an HTTP request or response is in a format and encoding defined by the entity-header fields. entity-body = *OCTET An entity-body is only present in a message when a message-body is present, as described in section 4.3. The entity-body is obtained from the message-body by decoding any Transfer-Encoding that might have been applied to ensure safe and proper transfer of the message. 7.2.1 Type When an entity-body is included with a message, the data type of that body is determined via the header fields Content-Type and Content- Encoding. These define a two-layer, ordered encoding model: entity-body := Content-Encoding( Content-Type( data ) ) And RFC2388 multipart/form-data spec is completely clear on this... 4.5 Charset of text in form data Each part of a multipart/form-data is supposed to have a content- type. In the case where a field element is text, the charset parameter for the text indicates the character encoding used. For example, a form with a text field in which a user typed 'Joe owes <eu>100' where <eu> is the Euro symbol might have form data returned as: --AaB03x content-disposition: form-data; name="field1" content-type: text/plain;charset=windows-1250 content-transfer-encoding: quoted-printable Joe owes =80100. --AaB03x So what does the HTML spec have to say? The <FORM > submission element does include the accept-charset attribute, perhaps that is what you are looking for? Otherwise, if the user agents don't observe RFC 2388 then you should really take that up with the user agent vendors. --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx