Re: Re: mod_cgi: multibyte characters in REQUEST_URI can't converted to correct PATH_INFO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/16/2010 4:06 AM, LiuYan åç wrote:
> William A. Rowe Jr. <wrowe <at> rowe-clan.net> writes:
> 
>>
>> On 12/1/2010 9:31 AM, LiuYan åç wrote:
>>> Recently I setup Apache-2.2.17 on Windows Server 2003, and config viewvc in 
> CGI 
>>> mode, viewvc works fine except browsing repository entry which contains 
> Chinese 
>>> characters, it will return HTTP 404 when browsing these entryies, I asked 
> in 
>>> viewvc-users mailing list, they said CGI will interact with system using 
> the 
>>> locale is in use by the environment in which it's running( 
>>> http://viewvc.tigris.org/ds/viewMessage.do?
> dsForumId=4255&dsMessageId=2686631 ).
>>
>> If you set up viewvc's CGI host to run under the utf-8 code page, things 
> should
>> work correctly.  On win32, all file names are unicode, and httpd and dav then
>> represent these as utf-8.
>>
> 
> Thank you William!
> 
> I don't how to set default windows code page to UTF-8, there's no UTF-8 in 
> ControlPanel--Locale/Language--Advanced, I try change code page to 65001(UTF-8) 
> in DOS prompt window, and run httpd.exe in DOS prompt window, but I got same 
> result.

Numerically you are right.  Just to understand what httpd does, it has passed all
of the environment table and CGI variables as Unicode.  That will be translated
by windows cmd.exe environment into whatever code page you are running (and you
should choose the code page to include all of your possible responses).  When
you prepare results which offer links, you might explicitly need to translate
them to utf-8.

If you run a unicode-aware language, there is no translation at all, or if there
is translation, it occurs based on the unicode program input from the environment.

> part of that answer:
> ---------
> ...
> However most byte-based tools using the C stdio (and I'm assuming this applies 
> to ColdFusion, as it does under Perl, Python 2, PHP etc.) then try to read the 
> environment variables as bytes, and the MS C runtime encodes the Unicode 
> contents again using the Windows default code page. So any characters that 
> don't fit in the default code page are lost for good. This would include your 
> Arabic characters when running on a Western Windows install.


exactly, any time you pass through the command environment this happens, unless
the program entry points are the unicode-aware flavors.


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx



[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux