I seem to be having a minor issue with preg_replace not working as
expected when using UTF-8 strings. So far I have found out that \w
doesn't seem to be detecting UTF-8 strings.
This is my test php file:
<?php
$data = 'ooooooooooooooooooooooo';
echo 'Data before: ', $data, '<br />';
$data = preg_replace('~([\w\.]{6})~u', '$1 < >', $data);
echo 'Data After: ', $data;
// UTF-8 Test
$data = 'ффффффффффффффффффффффф';
echo '<hr />Data before: ', $data, '<br />';
$data = preg_replace('~([\w\.]{6})~u', '$1 < >', $data);
echo 'Data After: ', $data;
?>
I would expect it to be:
Data before: ooooooooooooooooooooooo
Data After: oooooo < >oooooo < >oooooo < >ooooo
---
Data before: ффффффффффффффффффффффф
Data After: фффффф <>фффффф <>фффффф<> ффффф
But what I get is:
Data before: ooooooooooooooooooooooo
Data After: oooooo < >oooooo < >oooooo < >ooooo
---
Data before: ффффффффффффффффффффффф
Data After: ффффффффффффффффффффффф
Did I go about this the wrong way or is this a php bug itself?
I tested this in php 5.3, 5.2.9 and 6.0 (snapshot from a couple weeks
ago) and received the same results.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php