RE: File handling and different character sets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Per Eriksson [mailto:per.eriksson@xxxxxxxx]
> Sent: Friday, November 23, 2007 7:15 AM
> To: php-general@xxxxxxxxxxxxx
> Subject:  File handling and different character sets
> 
> Hi,
> 
> I would like to know how you work with the PHP Directory Functions and
> different character sets. If I am having a professional view,
> well-written code should be able to handle file systems in different
> character sets.
> 
> http://se.php.net/manual/sv/ref.dir.php
> 
> Is there a way to write code for listing files from a ISO-8859-1 on a
> UTF-8 page? I haven't succeeded with this.
> 
> 
> Thank you,
> 
> Best Regards
> 
> Per Eriksson
> per.eriksson A exist ! se
> 
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php

Hi Per,

I'm just curious, no matter what encoding I choose (IE and FF switch
automatically to UTF-8 as per the page metatag and content-type header) I
get funny characters at http://se.php.net/manual/sv/ref.dir.php, I don't
know if this is because of the default browser font, because I've tried
several ones. My system is Windows XP SP2 Spanish version, but I don't think
that's the cause either as it is up to date, and I have even installed
support for right to left writing...
Ok, I know I can just use wget, save the result and open it in a binary
editor to see what are the actual bytes and check for the encoding (I
won't... I'm kind of lazy today :D )

Regarding your question, I have these functions I copied from the notes to
the extended CHM version of the PHP manual, they are at the
mb_convert_encoding function reference and should be in the online version
of the manual as well (won't check it... too lazy, I told you)...

[snip]
volker at machon dot biz (25-Sep-2007 05:05)

Hey guys. For everybody who's looking for a function that is converting an
iso-string to utf8 or an utf8-string to iso, here's your solution:
public function encodeToUtf8($string) {
    return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string,
"UTF-8, ISO-8859-1, ISO-8859-15", true));
}
public function encodeToIso($string) {
    return mb_convert_encoding($string, "ISO-8859-1",
mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}
For me these functions are working fine. Give it a try
[/snip]

The first thing to test for would be if the directory/filesystem functions
are retrieving data encoded in ISO-8859-1 or not (I guess it depends on the
OS, but you might know better), otherwise mb_convert_encoding would act like
"double escaping" or "double urlencoding" (a known issue for all of us,
ha?). That's why encodeToUtf8 uses mb_detect_encoding first... anyway, I
wonder if mb_detect_encoding can guarantee you anything other than the byte
stream of data being valid in the given character set(s). So... what do you
think, did you get any further results about this? And also, do you have any
code sample you are working on to share?

Regards,

Rob


Andrés Robinet | Lead Developer | BESTPLACE CORPORATION
5100 Bayview Drive 206, Royal Lauderdale Landings, Fort Lauderdale, FL 33308
| TEL 954-607-4207 | FAX 954-337-2695 | 
Email: info@xxxxxxxxxxxxx  | MSN Chat: best@xxxxxxxxxxxxx  |  SKYPE:
bestplace |  Web: bestplace.biz  | Web: seo-diy.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux