Alain Williams wrote: > On Wed, May 28, 2014 at 02:12:32PM +0200, Christoph Becker wrote: >> Alain Williams wrote: >> >>> I am trying to use this to validate input that is supposed to be UTF-8 and to >>> replace any bad characters with something - '?' would do. >>> >>> I have the test program below. No matter what I try to give as an argument to >>> mb_substitute_character() it always removes the bad input sequence, I would like >>> to replace it. >> >> Have you considered using htmlspecialchars($input, ENT_SUBSTITUTE, >> 'UTF-8') instead of mb_substitute_character()? > > OK-ish -- thanks. > > * ENT_SUBSTITUTE is only available from PHP 5.4 - my production machine is PHP 5.3.3 (CentOS) > > * It also munges & < > -- but I can undo that with htmlspecialchars_decode() > > * I need to replace the Unicode Replacement Character ("\xEF\xBF\xBD") with a '?' (easy) > > * If I give it an over long character encoding (I tested "\xC0\xBC") it replaces > each byte with a '?' - so I get two of them. > > It would be nice to get mb_substitute_character() working. You might be out of luck with PHP 5.3, see <http://3v4l.org/I3mEd>. -- Christoph M. Becker -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php