Re: Internationalisation and MB strings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 1, 2008 at 9:50 AM, Yeti <yeti@xxxxxxxxxx> wrote:> <?php> *# Hello Community> # Internationalisation, a topic discussed more than enough and YES, I am> looking forward to PHP6.> # But in reality I still have to develop for PHP4 and that's where the dog> is burried ^^> # We have a customer here who is running a small site, but still in five> different languages.> # Lately he started complaining about some strange site behaviours:>> # He has a discussion board where people can post their ideas, comments etc.> Nothing special> # Every post has a maximum length of 2048 characters, which is checked by> JavaScript at the Browser> # and after submitting the form by PHP.>> # Our mistake was to use strlen();*> global $cc_strlen; global $cc_mb;> $cc_strlen = $cc_mb = 0;> if (array_key_exists('text', $_POST)) {>  $cc_strlen = strlen($_POST['text']);>  $cc_mb = mb_strlen($_POST['text'], 'UTF-8'); *// new code*>  if ($cc_strlen > 2048) { /* snip */ } // do something> }>> /* snip */ // do something>> *#this works fine as long as the user only submits single byte charachters,> but with UTF-8 the whole thing changes ..*> ?>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "> http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>> <html xmlns="http://www.w3.org/1999/xhtml";>> <head>> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />> <title>test</title>> </head>> <body>> <p>You submitted <?php echo $cc_strlen; ?> characters (STRLEN).</p>> <p>You submitted <?php echo $cc_mb; ?> characters (MB_STRLEN).</p>> <p>Characters Left:<span id="remainder">2048</span></p>> <form action="" method="post" onsubmit="return false;" id="post_form">> <textarea id="post_text" name="text" onkeydown="check_length();"> onchange="check_length();" rows="10" cols="50">œŸŒ‡Ņ</textarea><br />> <input type="submit" value="Submit" id="post_button"> onclick="submit_form();" />> </form>> <script type="text/javascript">> <!--> var the_form = document.getElementById('post_form');> var textarea = document.getElementById('post_text');> var counter = document.getElementById('remainder');> function check_length() {>  var remainder = 2048 - textarea.value.length;>  var length_alert = false;>  if (remainder < 0) {>  remainder = 0;>  for (var count = textarea.value.length; (count >= 2048); (count -= 1)) {>  textarea.value = textarea.value.substr(0, 2047);>  counter.style.color = 'red'>  length_alert = true;>  }>  }>  if (length_alert) alert('You are already using 2048 characters.');>  if (document.all) {>  counter.innerText = remainder;>  } else {>  counter.textContent = remainder;>  }> }> function submit_form() {>  check_length();>  the_form.submit();>  alert ('You submitted ' + textarea.value.length + ' characters');>  return true;> }> -->> </script>> <?php> *# Now as soon as one is starting to submit UTF-8 characters strlen is not> working proberly any more> # So we had to work through thousands of lines of code, replacing strlen()> with mb_strlen();> # We also found mb_strlen to take about 8 times longer than strlen().*>> $s_t = microtime();> mb_strlen('œŸŒ‡Ņ', 'UTF-8');> $e_t = microtime();> echo '<p>MB_STRLEN took : '.(($e_t - $s_t)*1000).' milliseconds</p>';> $s_t = microtime();> strlen('œŸŒ‡Ņ');> $e_t = microtime();> echo '<p>STRLEN took : '.(($e_t - $s_t)*1000).' milliseconds</p>';>> *# So much for internationalisation.> # Just writing this as a reminder for everyone who is facing similar> situations.*> ?>> </body>> </html>>
You can't determine timing by simply calling each function one time. Ichanged your script to the following:
<?php
$iterations = 10000;
$s_t = microtime(true);for ($i = 0; $i < $iterations; ++$i) {    mb_strlen('œŸŒ‡Ņ', 'UTF-8');}$e_t = microtime(true);echo '<p>MB_STRLEN took : '.(($e_t - $s_t)*1000/$iterations).'milliseconds</p>';
$s_t = microtime(true);for ($i = 0; $i < $iterations; ++$i) {    strlen('œŸŒ‡Ņ');}$e_t = microtime(true);echo '<p>STRLEN took : '.(($e_t - $s_t)*1000/$iterations).' milliseconds</p>';
?>
I ran this script several times, and the results below are fairly typical:
MB_STRLEN took : 0.054733037948608 milliseconds
STRLEN took : 0.037568092346191 milliseconds

The multi-byte function is slower, but not even by a factor of 2 on mydevelopment machine.
Andrew

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux