I would agree that in this scenario the ISO standard would not help since it would only govern how the SQL client talks to the db server, not when it is placed on a web server. I think the situation is actually very similar to the one described in RFC 6657 where there maybe a conflict between the charset parameter on the outside and inside the payload: In order to improve interoperability with deployed agents, "text/*" media type registrations SHOULD either a. specify that the "charset" parameter is not used for the defined subtype, because the charset information is transported inside the payload (such as in "text/xml"), or b. require explicit unconditional inclusion of the "charset" parameter, eliminating the need for a default value. In accordance with option (a) above, registrations for "text/*" media types that can transport charset information inside the corresponding payloads (such as "text/html" and "text/xml") SHOULD NOT specify the use of a "charset" parameter, nor any default value, in order to avoid conflicting interpretations should the "charset" parameter value and the value specified in the payload disagree. While this is in the "application/*" tree, going with choice A would essentially drop the "charset" parameter and in your example, would have the implementors trying to figure out the charset from the payload itself. The question is what happens when the SQL file itself carries no charset information, such as when using "mysql-dump" with the "--skip-set-charset" option. According to MYSQL, UTF-8 would be used in v5.1+ and ASCII in versions prior to that. Perhaps, we should leave "charset" as an optional parameter for cases like these, and just take out the default value. Yakov On Tue, Feb 5, 2013 at 10:05 AM, Bjoern Hoehrmann <derhoermi@xxxxxxx> wrote: > * Yakov Shafranovich wrote: >>[...] > > I am interested in this situation: > > -> Someone wants to publish database contents or schema > -> Use DB-specific dumping tool to create .sql file > -> Puts .sql file on web server > -> Server associates .sql with proposed media type > > -> Someone else downloads this resource > -> Checks IANA registry for the media type > -> Finds proposed specification > > Note that there is no step "publisher of .sql file ensures that the dump > tool generates US-ASCII encoded text, or otherwise makes sure the text's > in a single character encoding and makes sure the web server includes > the character encoding label in the `charset` header of the Content-Type > header when serving the .sql file". Experience suggests that respones > will include no or an incorrect label and downloaders are likely to ig- > nore the charset parameter even if correctly specified. However, reading > the draft the person in the sceanrio above would assume that he has got > US-ASCII encoded text, even though that's fairly unlikely, especially in > the future given "international text" and using UTF-8 without escapes is > becoming increasingly common. > > Similarily, the draft would tell him to check some ISO standard for "the > Structured Query Language", even though most likely he should instead > identify which database software generated the file and check the manual > for that software to find out about all the files. As a simple example, > the dumps from <http://dumps.wikimedia.org/> read like this: > > -- MySQL dump 10.13 Distrib 5.1.66, for debian-linux-gnu (x86_64) > -- > -- Host: 10.0.6.76 Database: frrwiki > -- ------------------------------------------------------ > -- Server version 5.1.53-wm-log > > /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */; > /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */; > /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */; > /*!40101 SET NAMES utf8 */; > ... > -- > -- Table structure for table `category` > -- > > DROP TABLE IF EXISTS `category`; > /*!40101 SET @saved_cs_client = @@character_set_client */; > /*!40101 SET character_set_client = utf8 */; > ... > > They do not currently use the proposed type, but if they did, you will > have to know the format of "MySQL dump" files and what the codes in the > comments here mean to conclude that these are actually UTF-8 encoded > files. Google will find other examples with `character_set_client` for > other character encodings like "latin1". The ISO standard, as far as I > am aware, will not help you there, and neither does the US-ASCII default > proposed in the draft. > -- > Björn Höhrmann · mailto:bjoern@xxxxxxxxxxxx · http://bjoern.hoehrmann.de > Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de > 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/