It was thus said that the Great greg wm once stated: > > > right after the title the file says <meta http-equiv="Content-Type" > > content="text/html; charset=iso-8859-1">. why isn't that good enough? > > why does it make no difference at all what i change it to? i tried > > utf-8, Utf-8, UTF-8, Windows-1252, none have any effect tho i can see > > them if i tell my browser to view source. > > overridden by apache's http headers, apparently. see below. I can't find it at the moment, but I think that if the webserver sends out the charset as part of the headers, the browser is supposed to take that as gospel and ignore whatever it says in the document. You can play around with this at [2]. > > the command i ran was > > wget -ENKkrl19 -nH -w2 -owget.log http://nonviolentpeaceforce.org > > my locale is en_IE.UTF-8, so why did wget save in latin-1 format? > the wget manual page mentions nothing at all about character sets. I would assume (not looking at the source code in question) that wget just saves the data as-is, from the server, as that does the least surprising thing (imaging if wget changed all GIF files to PNG as matter of course). Things could get very confusing indeed. Image the following HTML document: <!DOCTYPE ... > <HTML lang="il"> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="ISO-8859-9"> ... </HEAD> ... </HTML> (I *think* I got it correct---this is an example anyway---the point being that the file in question is *NOT* UTF-8 *OR* ISO-8859-1). wget converts the output to your language setting, UTF-8. But the file *still* says: <META HTTP-EQUIV="Content-Type" CONTENT="ISO-8859-9"> Unless you expect wget to *also* change this as well. > thank you brian! perhaps iconv might have done the trick, anyway i used > vim. vim :se fileencoding revealed that wget saved the files in > latin-1, and :se fileencoding=utf-8 for each file cleaned up the mess. > wasn't even a big job after using :map such that each file was fixed > with a single keystroke. iconv would have, and it can be scripted. > > The http headers are authoritative, and override any metadata. If you > > rather control your encoding with meta tags, turn off charsets entirely. > > that is probably the winning answer. i already applied the above > solution so i dunno for sure, but look.. > > wget --save-headers from the original m$ .asp server: > HTTP/1.1 200 OK > Server: Microsoft-IIS/5.0 > Date: Sat, 20 Aug 2005 21:18:55 GMT > Content-Type: text/html This just tells the browser that the page in question an HTML file. No mention of what character encoding the page in question is in---it's up to the browser to determine that. Without a <META> tag specifying otherwise, the browser can either attempt to determine the charset (my Linux server supports 102 encoding schemes; my Mac mini 134) or just assume that the page is in one particular set (my Firefox setup defaults to ISO-8859-1) and go from there (and that's probably faster than trying upto 100+ encodings). > wget --save-headers from my apache server: > HTTP/1.1 200 OK > Date: Sun, 21 Aug 2005 04:10:34 GMT > Server: Apache/2.0.52 (CentOS) > Last-Modified: Sun, 21 Aug 2005 01:34:43 GMT > ETag: "260261-2b33-9134b2c0" > Accept-Ranges: bytes > Content-Length: 11059 > Connection: close > Content-Type: text/html; charset=UTF-8 > > now i wouldn't have thought that the following httpd.conf directive > would result in overriding the meta http-equiv headers, but, there does > seem to be a strong odor.. Internationization on the web [1] is a mess [2]. No doubt about that. >From my reading, the best thing to do is either set the default character set in Apache and make sure every document is in that form, or tell Apache *not* to send the character set, set it in the <META HTTP-EQUIV> tag in the document and make sure that document is in that format. -spc (Been playing around with this stuff for a bit ... ) [1] http://www.intertwingly.net/blog/?q=Internationalization [2] http://www.intertwingly.net/blog/2005/02/11/Meta-Charset-Update --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx