Hi Adam,
I have a prof that the XML advise does not work in real cases I had.
We are using XMLs in our system but when you edit the XML with a text
editor and put the XML heading of UTF-8
<?xml version="1.0" encoding="UTF-8"?>
it DOES NOT assure the text inside is encoded in UTF-8 so but maybe
(many cases) t other iso-xxx method.
My question was for a function that scan the bytes of the file and
decided WITHOUT the BOM heading.
I mean by checking the bytes sequence in the file.
I claim that WITHOUT a BOM it might be impossible to assure it is UTF-8
encoding which is a whole escape sequence logic
that may convert one character into one, two or three character.
Any advise if I'm right on this or smart file scan function that makes it?
Eli
On 21/05/2011 20:03, Adam Richardson wrote:
On Sat, May 21, 2011 at 12:10 PM, Eli Orr (Office)
<eli.orr@xxxxxxxxxxxx <mailto:eli.orr@xxxxxxxxxxxx>> wrote:
Dear PHP Gurus,
I have a debate on the following please let me know what is true /
false.
I'am using a PHP function *is_UTF_8_file ($file_name) *that I've
found as part of my PHP 5.3 installation.
This function checks if the file start with the 3 UTF-8 BOM bytes.
However another guy told me that there is way to detect if a file
is a UTF-8 without having the BOM at the file start.
To me it sounds impossible since if you do not have this
indication you have a stream of bytes that you can never tell 100%
if that is UTF-8 or else.
Who is rigt here ?
If there is a Magical function that can detect files without a BOM
if they are UTF-8 or not please share you knowledge if this
is not a "NULL" or impossible function as I thought.
Here's a great write-up I've got bookmarked (he points out Windows
Notepad automatically determines the encoding):
http://codesnipers.com/?q=node/68
* If it's an XML file, the structure allows you determine the
encoding.
* For other files, you can encode it as UTF-8 and look for
improper encodings.
As far as a PHP function that already does this, I'm not aware of it,
but you could make a system call to "file" if your on Linux, as it
tries to automatically determine the encoding:
http://linux.die.net/man/1/file
Adam
--
Nephtali: A simple, flexible, fast, and security-focused PHP framework
http://nephtaliproject.com
--
Best Regards,
*Eli Orr*
CTO & Founder
*LogoDial Ltd.*
M:+972-54-7379604
O:+972-74-703-2034
F: +972-77-3379604
Plaut 10, Rehovot, Israel
Email: _Eli.Orr@xxxxxxxxxxxxx
Skype: _eliorr.com_