Re: Wrong charset convert

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jiří Eichler wrote:
...
Hi.
I do not know the answer precisely either.
But I know enough to tell you that in such matters, you must be /extremely/ careful in interpreting what is really going on, at each level. Just as a stupid example : when you look at a log file, you must know : - has the process that writes that logfile transformed the data into some encoding already, when writing it to the logfile ?
- is the editor which I am using aware of the logfile encoding ?
etc...
Because otherwise what you see, and what is really there, may be different things.

For example, I think I remember that, internally, in the Windows NTFS filesystem, file names are stored as Unicode (not necessarily UTF-8, it could also be UTF-16 or another Unicode encoding).
(See for example here : http://www.ntfsrecovery.com/a-ntfs.php)
But when you look at a directory through the Explorer, these internal filenames /may/ get transformed according to your PC's codepage, just to display it to you.
So what you think you see, is not necessarily what is really there.
Understand what I'm saying ?

Just some elements :
- Apache should not "translate" or "encode" the received URL, because basically it does not know if this URL is in UTF-8, ASCII, or any other encoding. There is no "flag" or "header" in a HTTP request, that says in which encoding the "GET" line comes in. (e.g. it may also be some Japanese or Chinese encoding).
So it /must/ take it as bytes.
- then Apache calls the OS to find the file. There may, or may not, be some translation there, I really don't know. It may depend on what API call the program uses to read the directory, and I don't know what Apache uses. - it's the same for your C program. I don't know if the OpenFile() call interprets "name" as a pure byte sequence, or if it converts it internally, or whatever.
- and we don't know if Apache and your program use the same API calls.

For example, in Java or Perl, there are different ways to open a file and to read/write from it, some with encoding/decoding going on, some not. Unfortunately, I am incompetent in C and Windows API, so I don't know in that case.

Obviously something is happening somewhere, and obviously it happens differently under Unix and under Windows.

Under Unix/Linux, most of these things are influenced by the "locale" under which the process is running. Under Windows, it is usually the whole system-wide "International settings" which count.

I think we need an Apache/Windows developer here, to really tell us what is going on.


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
  "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux