Re: Correcting contractions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Dotan Cohen wrote:
On 6/25/05, Robert Cummings <robert@xxxxxxxxxxxxx> wrote:

On Fri, 2005-06-24 at 21:02, Dotan Cohen wrote:

Hi friends, I've got a nice array of contractions (I've, I'd,
they'll,...). My intent is to take submitted data and replace, say,
every occurance of 'theyd' with 'they'd'. So far, so good. The trick
is doing it if the first character is uppercase. I tried going
throught the array, one by one, and doing the preg_replace twice, once
for each item, and once for each item with the first letter
capitalized. It wasn't very succesful, so I've been doing this:
$the_lyrics=str_replace("\bid\b", "I'd", $the_lyrics);
$the_lyrics=str_replace("\bi'd\b", "I'd", $the_lyrics);
$the_lyrics=str_replace("\bId\b", "I'd", $the_lyrics);
$the_lyrics=str_replace("\bim\b", "I'm", $the_lyrics);
$the_lyrics=str_replace("\bi'm\b", "I'm", $the_lyrics);
$the_lyrics=str_replace("\bIm\b", "I'm", $the_lyrics);
$the_lyrics=str_replace("\bi've\b", "I've", $the_lyrics);
$the_lyrics=str_replace("\bive\b", "I've", $the_lyrics);
$the_lyrics=str_replace("\bIve\b", "I've", $the_lyrics);
$the_lyrics=str_replace("\bi'll\b", "I'll", $the_lyrics);
$the_lyrics=str_replace("\bIll\b", "I'll", $the_lyrics);
$the_lyrics=str_replace("\bi\b", "I", $the_lyrics);
$the_lyrics=str_replace("\byoure\b", "you're", $the_lyrics);
$the_lyrics=str_replace("\bYoure\b", "You're", $the_lyrics);
$the_lyrics=str_replace("\byoull\b", "you'll", $the_lyrics);
$the_lyrics=str_replace("\bYoull\b", "You'll", $the_lyrics);
$the_lyrics=str_replace("\byouve\b", "you've", $the_lyrics);
$the_lyrics=str_replace("\bYouve\b", "You've", $the_lyrics);
$the_lyrics=str_replace("\bits\b", "it's", $the_lyrics);
$the_lyrics=str_replace("\bIts\b", "It's", $the_lyrics);
$the_lyrics=str_replace("\bwasnt\b", "wasn't", $the_lyrics);
$the_lyrics=str_replace("\bWasnt\b", "Wasn't", $the_lyrics);
$the_lyrics=str_replace("\bthats\b", "that's", $the_lyrics);
$the_lyrics=str_replace("\bThats\b", "That's", $the_lyrics);
$the_lyrics=str_replace("\btheyre\b", "they're", $the_lyrics);
$the_lyrics=str_replace("\bTheyre\b", "They're", $the_lyrics);
$the_lyrics=str_replace("\btheyll\b", "they'll", $the_lyrics);
$the_lyrics=str_replace("\bTheyll\b", "They'll", $the_lyrics);
$the_lyrics=str_replace("\bcant\b", "can't", $the_lyrics);
$the_lyrics=str_replace("\bCant\b", "Can't", $the_lyrics);
$the_lyrics=str_replace("\bdidnt\b", "didn't", $the_lyrics);
$the_lyrics=str_replace("\bDidnt\b", "Didn't", $the_lyrics);
$the_lyrics=str_replace("\bdont\b", "don't", $the_lyrics);
$the_lyrics=str_replace("\bDont\b", "Don't", $the_lyrics);
$the_lyrics=str_replace("\bdoesnt\b", "doesn't", $the_lyrics);
$the_lyrics=str_replace("\bDoesnt\b", "Doesn't", $the_lyrics);
$the_lyrics=str_replace("\bweve\b", "we've", $the_lyrics);
$the_lyrics=str_replace("\bWeve\b", "We've", $the_lyrics);

Which, as you can see, is not exactly optimized code. How would
someone more professional than myself go about this? I was thinking
about maybe a two-dimentional array, but stopped short to consult with
you guys first.

string_replace() supports taking two arrays from which to retrieve the
needles and the replacements so that you only need to invoke the
function once. This will speed things up considerably. On that note you
have a couple of bugs...

   "its" is a valid word for possession (its woodwork is exquisite).

   'Ill" is also valid (Ill beset by fortune).

Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting  |
| a powerful, scalable system for accessing system services  |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for       |
| creating re-usable components quickly and easily.          |
`------------------------------------------------------------'




Ill I knew about, its I didn't. I didn't mean to put ill in there...

Should I enter each contraction twice (for the capitalization), or
should I try to do something smart so that the capitalization will
happen automatically. The 'I' contractions are special, I will deal
with those seperatly.

Dotan,

Your task intrigued me, so I put together a function that will help process your data:

<?php

// This is the array of correct spellings for the target words, all lower case first letters.
$list = array();
$list[] = "wasn't";
$list[] = "that's";
$list[] = "they're";
$list[] = "they'll";
$list[] = "can't";
$list[] = "didn't";

// my sample text that needs correction
$string = "I wasnt there, Theyll tell you I cant and Didnt.";



function addApos($list, $string) {
  // I am assuming that you will make sure that $list is in the correct format and other error checking.
  // Here I am creating two arrays with matching keys and values with both lower & upper case first letters
  // and with & without the correct apostrophe.
  // Then I am using the two new arrays to process the string for correction.
  $list_case = array();
  $list_case_strip = array();
  $i = 0;
  foreach($list as $value) {
     $list_case[$i] = $value;
     $list_case_strip[$i] = str_replace("'", "", $value);
     $i++;
     $list_case[$i] = strtoupper($value{0}).substr($value, 1);
     $list_case_strip[$i] = str_replace("'", "", $list_case[$i]);
     $i++;
  }

  $string_fixed = str_replace($list_case_strip, $list_case, $string);

  return $string_fixed;
}

$result = addApos($list, $string);

print "Original string: $string<br /><br />";
print "Corrected string: $result<br />";

?>

--Bob

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux