preg_replace with UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I seem to be having a minor issue with preg_replace not working as expected when using UTF-8 strings. So far I have found out that \w doesn't seem to be detecting UTF-8 strings.

This is my test php file:
<?php
$data = 'ooooooooooooooooooooooo';
echo 'Data before: ', $data, '<br />';

$data = preg_replace('~([\w\.]{6})~u', '$1 < >', $data);
echo 'Data After: ', $data;

// UTF-8 Test
$data = 'ффффффффффффффффффффффф';
echo '<hr />Data before: ', $data, '<br />';

$data = preg_replace('~([\w\.]{6})~u', '$1 < >', $data);
echo 'Data After: ', $data;

?>


I would expect it to be:
Data before: ooooooooooooooooooooooo
Data After: oooooo < >oooooo < >oooooo < >ooooo
---
Data before: ффффффффффффффффффффффф
Data After: фффффф <>фффффф <>фффффф<> ффффф

But what I get is:
Data before: ooooooooooooooooooooooo
Data After: oooooo < >oooooo < >oooooo < >ooooo
---
Data before: ффффффффффффффффффффффф
Data After: ффффффффффффффффффффффф

Did I go about this the wrong way or is this a php bug itself?
I tested this in php 5.3, 5.2.9 and 6.0 (snapshot from a couple weeks ago) and received the same results.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux