On Sat, May 21, 2011 at 12:10 PM, Eli Orr (Office) <eli.orr@xxxxxxxxxxxx>wrote: > > Dear PHP Gurus, > > I have a debate on the following please let me know what is true / false. > > I'am using a PHP function *is_UTF_8_file ($file_name) *that I've found as > part of my PHP 5.3 installation. > This function checks if the file start with the 3 UTF-8 BOM bytes. > > However another guy told me that there is way to detect if a file is a > UTF-8 without having the BOM at the file start. > To me it sounds impossible since if you do not have this indication you > have a stream of bytes that you can never tell 100% if that is UTF-8 or > else. > > Who is rigt here ? > If there is a Magical function that can detect files without a BOM if they > are UTF-8 or not please share you knowledge if this > is not a "NULL" or impossible function as I thought. > Here's a great write-up I've got bookmarked (he points out Windows Notepad automatically determines the encoding): http://codesnipers.com/?q=node/68 - If it's an XML file, the structure allows you determine the encoding. - For other files, you can encode it as UTF-8 and look for improper encodings. As far as a PHP function that already does this, I'm not aware of it, but you could make a system call to "file" if your on Linux, as it tries to automatically determine the encoding: http://linux.die.net/man/1/file Adam -- Nephtali: A simple, flexible, fast, and security-focused PHP framework http://nephtaliproject.com