On 22 May 2011 08:17, Eli Orr (Office) <eli.orr@xxxxxxxxxxxx> wrote: > Hi Adam, > > I have a prof that the XML advise does not work in real cases I had. > We are using XMLs in our system but when you edit the XML with Âa text > editor and put the XML heading of UTF-8 > <?xml version="1.0" encoding="UTF-8"?> > > it DOES NOT assure the text inside is encoded in UTF-8 so but maybe (many > cases) t other iso-xxx method. The point of the header is telling readers what encoding is used. Of course that means errors are possible - setting the header is not magic, it doesn't change the rest of the file. You need to make sure the contents of the file match the encoding from the header when you make XML documents. Anyway, from your perspective, the header is an indication but not a foolproof way of figuring encoding out. > My question was for a function that scan the bytes of the file and decided > WITHOUT the BOM heading. > I mean by checking the bytes sequence in the file. > > I claim that WITHOUT a BOM it might be impossible to assure it is UTF-8 > encoding which is a whole escape sequence logic > that may convert one character into one, two or three character. http://se.php.net/manual/en/function.mb-detect-encoding.php - the first comment should be interesting to you. ***** If you try to use mb_detect_encoding to detect whether a string is valid UTF-8, use the strict mode, it is pretty worthless otherwise. <?php $str = 'ÃÃÃÃ'; // ISO-8859-1 mb_detect_encoding($str, 'UTF-8'); // 'UTF-8' mb_detect_encoding($str, 'UTF-8', true); // false ?> **** Regards Peter -- <hype> WWW: plphp.dk / plind.dk LinkedIn: plind BeWelcome/Couchsurfing: Fake51 Twitter: kafe15 </hype> -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php