At 8:01 AM +0200 10/1/07, Crab Hunt wrote:
Hi,
Is there a fix for removing the junk characters that appear when we copy and
paste some text from Microsoft word into a php form ? For example the double
quotes "" turn into something like *â??*
thanks in advance.
Crab:
That's something I'm working on as well. Part of
the problem is that not only does M$ inject junk,
but the user may actually be putting in something
other than ASCII. So, some of the strange stuff,
may not be junk.
As such, I found this (use both):
$text = preg_replace('/([\xc0-\xdf].)/se', "'&#'
. ((ord(substr('$1', 0, 1)) - 192) * 64 +
(ord(substr('$1', 1, 1)) - 128)) . ';'", $text);
$text = preg_replace('/([\xe0-\xef]..)/se', "'&#'
. ((ord(substr('$1', 0, 1)) - 224) * 4096 +
(ord(substr('$1', 1, 1)) - 128) * 64 +
(ord(substr('$1', 2, 1)) - 128)) . ';'", $text);
This is supposed to replace all characters (UTF-8) with their HTML entities.
This is untested by me, but shows promise.
If you find a simpler solution, please keep me in the loop.
Cheers,
tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php