On Wed, May 28, 2014 at 06:56:48AM -0300, Flavio Kenji Yanai wrote: > I don't test it ... > > $utf8_str = utf8_decode($original_str); > > if (!substr_cmp($utf8_str,$original_str,length($original_str)){ > echo "equal, valid utf8"; > } > else { > echo "not equal , non valid utf8 input"; > } Sorry, maybe I did not explain myself well enough. I want to be able to provide feedback to the user to say where something is wrong, so it would be nice to say, with the example that I gave, something like: Bad input detected, invalid character(s) replaced by '?': a bad angle bracket ? here I suppose that you could take the point of view that bad character encoding is a result of someone trying to break the PHP script & so you do not need to be nice. But maybe it is as a result of an innocent error somewhere. With a bit of work I can find the first difference & replace by '?', but as far as I can see mb_convert_encoding() should make it easy. > 2014-05-28 6:03 GMT-03:00 Alain Williams <addw@xxxxxxxxxxxx>: > > > I am trying to use this to validate input that is supposed to be UTF-8 and > > to > > replace any bad characters with something - '?' would do. > > > > I have the test program below. No matter what I try to give as an argument > > to > > mb_substitute_character() it always removes the bad input sequence, I > > would like > > to replace it. > > > > Thanks in advance > > > > <?php > > mb_internal_encoding("UTF-8"); > > > > // I have tried many lines like the 2 below > > // (comment out one or the other) > > mb_substitute_character((int)0x3013); > > mb_substitute_character((int)63); // '?' is ascii 63 > > > > // \xC0\xBC is invalid UTF-8 - over long encoding, should be \x3C > > $input = "a bad angle bracket \xC0\xBC here"; > > $valid = mb_convert_encoding($input, "UTF-8", "UTF-8"); > > > > // I always find 2 spaces between 'bracket' and 'here' > > echo "valid='$valid'\n"; -- Alain Williams Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer. +44 (0) 787 668 0256 http://www.phcomp.co.uk/ Parliament Hill Computers Ltd. Registration Information: http://www.phcomp.co.uk/contact.php #include <std_disclaimer.h> -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php