Re: a Debate here - How can you check a if a file is a UTF-8 without the BOM using PHP ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 21, 2011 at 12:10 PM, Eli Orr (Office) <eli.orr@xxxxxxxxxxxx>wrote:

>
> Dear PHP Gurus,
>
> I have a debate on the following please let me know what is true / false.
>
> I'am using a PHP function *is_UTF_8_file ($file_name) *that I've found as
> part of my PHP 5.3 installation.
> This function checks if the file start with the 3 UTF-8 BOM bytes.
>
> However another guy told me that there is way to detect if a file is a
> UTF-8 without having the BOM at the file start.
> To me it sounds impossible since if you do not have this indication you
> have a stream of bytes that you can never tell 100% if that is UTF-8 or
> else.
>
> Who is rigt here ?
> If there is a Magical function that can detect files without a BOM if they
> are UTF-8 or not please share you knowledge if this
> is not a "NULL" or impossible function as I thought.
>

Here's a great write-up I've got bookmarked (he points out Windows Notepad
automatically determines the encoding):
http://codesnipers.com/?q=node/68

   - If it's an XML file, the structure allows you determine the encoding.
   - For other files, you can encode it as UTF-8 and look for improper
   encodings.


As far as a PHP function that already does this, I'm not aware of it, but
you could make a system call to "file" if your on Linux, as it tries to
automatically determine the encoding:
http://linux.die.net/man/1/file

Adam

-- 
Nephtali:  A simple, flexible, fast, and security-focused PHP framework
http://nephtaliproject.com

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux