On Wed, May 28, 2014 at 02:12:32PM +0200, Christoph Becker wrote: > Alain Williams wrote: > > > I am trying to use this to validate input that is supposed to be UTF-8 and to > > replace any bad characters with something - '?' would do. > > > > I have the test program below. No matter what I try to give as an argument to > > mb_substitute_character() it always removes the bad input sequence, I would like > > to replace it. > > Have you considered using htmlspecialchars($input, ENT_SUBSTITUTE, > 'UTF-8') instead of mb_substitute_character()? OK-ish -- thanks. * ENT_SUBSTITUTE is only available from PHP 5.4 - my production machine is PHP 5.3.3 (CentOS) * It also munges & < > -- but I can undo that with htmlspecialchars_decode() * I need to replace the Unicode Replacement Character ("\xEF\xBF\xBD") with a '?' (easy) * If I give it an over long character encoding (I tested "\xC0\xBC") it replaces each byte with a '?' - so I get two of them. It would be nice to get mb_substitute_character() working. -- Alain Williams Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer. +44 (0) 787 668 0256 http://www.phcomp.co.uk/ Parliament Hill Computers Ltd. Registration Information: http://www.phcomp.co.uk/contact.php #include <std_disclaimer.h> -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php