Re: case and accent - insensitive regular expression?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 15, 2008 at 5:38 AM, Yeti <yeti@xxxxxxxxxx> wrote:> I dont think using all these regular expressions is a very efficient way to> do so. As i previously pointed out there are many users who had a similar> problem, which can be viewed at:>> http://it.php.net/manual/en/function.strtr.php>> One of my favourites is what derernst at gmx dot ch used.>> derernst at gmx dot ch> wrote on 20-Sep-2005 07:29> This works for me to remove accents for some characters of Latin-1, Latin-2> and Turkish in a UTF-8 environment, where the htmlentities-based solutions> fail:>>> <?php>> function remove_accents($string, $german=false) {>>   // Single letters>>   $single_fr = explode(" ", "� � � � � � &#260; &#258; � &#262; &#268;> &#270; &#272; � � � � � &#280; &#282; &#286; � � � � &#304; &#321; &#317;> &#313; � &#323; &#327; � � � � � � &#336; &#340; &#344; � &#346; &#350;> &#356; &#354; � � � � &#366; &#368; � � &#377; &#379; � � � � � � &#261;> &#259; � &#263; &#269; &#271; &#273; � � � � &#281; &#283; &#287; � � � �> &#305; &#322; &#318; &#314; � &#324; &#328; � � � � � � � &#337; &#341;> &#345; &#347; � &#351; &#357; &#355; � � � � &#367; &#369; � � � &#378;> &#380;");>>   $single_to = explode(" ", "A A A A A A A A C C C D D D E E E E E E G I I I> I I L L L N N N O O O O O O O R R S S S T T U U U U U U Y Z Z Z a a a a a a> a a c c c d d e e e e e e g i i i i i l l l n n n o o o o o o o o r r s s s> t t u u u u u u y y z z z");>>   $single = array();>>   for ($i=0; $i<count($single_fr); $i++) {>>   $single[$single_fr[$i]] = $single_to[$i];>>   }>>   // Ligatures>>   $ligatures = array("�"=>"Ae", "�"=>"ae", "�"=>"Oe", "�"=>"oe", "�"=>"ss");>>   // German umlauts>>   $umlauts = array("�"=>"Ae", "�"=>"ae", "�"=>"Oe", "�"=>"oe", "�"=>"Ue",> "�"=>"ue");>>   // Replace>>   $replacements = array_merge($single, $ligatures);>>   if ($german) $replacements = array_merge($replacements, $umlauts);>>   $string = strtr($string, $replacements);>>   return $string;>> }>> ?>>> I would change this function a bit ...>> <?php> //echo rawurlencode("áàéèíìóòúùÁÀÉÈÍÌÓÒÚÙ"); // RFC 1738 codes; NOTE: One> might use UTF-8 as this documents encoding> function remove_accents($string) {>  $string = rawurlencode($string);>  $replacements = array(>  '%C3%A1' => 'a',>  '%C3%A0' => 'a',>  '%C3%A9' => 'e',>  '%C3%A8' => 'e',>  '%C3%AD' => 'i',>  '%C3%AC' => 'i',>  '%C3%B3' => 'o',>  '%C3%B2' => 'o',>  '%C3%BA' => 'u',>  '%C3%B9' => 'u',>  '%C3%81' => 'A',>  '%C3%80' => 'A',>  '%C3%89' => 'E',>  '%C3%88' => 'E',>  '%C3%8D' => 'I',>  '%C3%8C' => 'I',>  '%C3%93' => 'O',>  '%C3%92' => 'O',>  '%C3%9A' => 'U',>  '%C3%99' => 'U'>  );>  return strtr($string, $replacements);> }> //echo remove_accents("CÀfé"); // I know it's not spelled right> echo remove_accents("áàéèíìóòúùÁÀÉÈÍÌÓÒÚÙ"); //OUTPUT (again: i used UTF-8> for document): aaeeiioouuAAEEIIOOUU> ?>>> Ciao>> Yeti>> On Mon, Jul 14, 2008 at 8:20 PM, Andrew Ballard <aballard@xxxxxxxxx> wrote:>>>> On Mon, Jul 14, 2008 at 1:35 PM, Giulio Mastrosanti>> <giulio@xxxxxxxxxxxxx> wrote:>> >>>> >>> > Brilliant !!!>> >>> > so you replace every occurence of every accent variation with all the>> > accent>> > variations...>> >>> > OK, that's it!>> >>> > only some more doubts ( regex are still an headhache for me... )>> >>> > preg_replace('/[iìíîïĩīĭįı]/iu',...  -- what's the meaning of iu after>> > the>> > match string?>>>> This page explains them both.>> http://us.php.net/manual/en/reference.pcre.pattern.modifiers.php>>>> > preg_replace('/[aàáâãäåǻāăą](?!e)/iu',... whats (?!e)  for? -- every>> > occurence of aàáâãäåǻāăą NOT followed by e?>>>> Yes. It matches any character based on the latin 'a' that is not>> followed by an 'e'. It keeps the pattern from matching the 'a' when it>> immediately precedes an 'e' for the character 'ae' for words like>> these:>>>> http://en.wikipedia.org/wiki/List_of_words_that_may_be_spelled_with_a_ligature>> (However, that may cause problems with words that have other variants>> of 'ae' in them. I'll leave that to you to resolve.)>> http://us.php.net/manual/en/regexp.reference.php>>>>>>>> > Many thanks again for your effort,>> >>> > I'm definitely on the good way>> >>> >      Giulio>> >>> >>> >>>> >> I was intrigued by your example, so I played around with it some more>> >> this morning. My own quick web search yielded a lot of results for>> >> highlighting search terms, but none that I found did what you're>> >> after. (I admit I didn't look very deep.) I was up to something like>> >> this before your reply came in. It's still by no means complete. It>> >> even handles simple English plurals (words ending in 's' or 'es'), but>> >> not variations that require changing the word base (like 'daisy' to>> >> 'daisies').>> >>>> >> <?php>> >> function highlight_search_terms($phrase, $string) {>> >>   $non_letter_chars = '/[^\pL]/iu';>> >>   $words = preg_split($non_letter_chars, $phrase);>> >>>> >>   $search_words = array();>> >>   foreach ($words as $word) {>> >>       if (strlen($word) > 2 && !preg_match($non_letter_chars, $word)) {>> >>           $search_words[] = $word;>> >>       }>> >>   }>> >>>> >>   $search_words = array_unique($search_words);>> >>>> >>   foreach ($search_words as $word) {>> >>       $search = preg_quote($word);>> >>>> >>       /* repeat for each possible accented character */>> >>       $search = preg_replace('/(ae|æ|ǽ)/iu', '(ae|æ|ǽ)', $search);>> >>       $search = preg_replace('/(oe|œ)/iu', '(oe|œ)', $search);>> >>       $search = preg_replace('/[aàáâãäåǻāăą](?!e)/iu',>> >> '[aàáâãäåǻāăą]', $search);>> >>       $search = preg_replace('/[cçćĉċč]/iu', '[cçćĉċč]', $search);>> >>       $search = preg_replace('/[dďđ]/iu', '[dďđ]', $search);>> >>       $search = preg_replace('/(?<![ao])[eèéêëēĕėęě]/iu',>> >> '[eèéêëēĕėęě]', $search);>> >>       $search = preg_replace('/[gĝğġģ]/iu', '[gĝğġģ]', $search);>> >>       $search = preg_replace('/[hĥħ]/iu', '[hĥħ]', $search);>> >>       $search = preg_replace('/[iìíîïĩīĭįı]/iu', '[iìíîïĩīĭįı]',>> >> $search);>> >>       $search = preg_replace('/[jĵ]/iu', '[jĵ]', $search);>> >>       $search = preg_replace('/[kķĸ]/iu', '[kķĸ]', $search);>> >>       $search = preg_replace('/[lĺļľŀł]/iu', '[lĺļľŀł]', $search);>> >>       $search = preg_replace('/[nñńņňʼnŋ]/iu', '[nñńņňʼnŋ]', $search);>> >>       $search = preg_replace('/[oòóôõöōŏőǿơ](?!e)/iu',>> >> '[oòóôõöōŏőǿơ]', $search);>> >>       $search = preg_replace('/[rŕŗř]/iu', '[rŕŗř]', $search);>> >>       $search = preg_replace('/[sśŝşš]/iu', '[sśŝşš]', $search);>> >>       $search = preg_replace('/[tţťŧ]/iu', '[tţťŧ]', $search);>> >>       $search = preg_replace('/[uùúûüũūŭůűųǔǖǘǚǜ]/iu',>> >> '[uùúûüũūŭůűųǔǖǘǚǜ]', $search);>> >>       $search = preg_replace('/[wŵ]/iu', '[wŵ]', $search);>> >>       $search = preg_replace('/[yýÿŷ]/iu', '[yýÿŷ]', $search);>> >>       $search = preg_replace('/[zźżž]/iu', '[zźżž]', $search);>> >>>> >>>> >>       $string = preg_replace('/\b' . $search . '(e?s)?\b/iu', '<span>> >> class="keysearch">$0</span>', $string);>> >>   }>> >>>> >>   return $string;>> >>>> >> }>> >> ?>>> >>>> >> I still can't help feeling there must be some better way, though.>> >>>> >>>>> >>> well, i think I'm on the good way now, unfortunately I have some other>> >>> urgent work and can't try it immediately, but I'll let you know    :)>> >>>>> >>> thank you!>> >>>>> >>>   Giulio>> >>>> >>>> >> Andrew>> >>>> >>>> >>> >>>
I agree it doesn't seem very efficient to me, but I haven't come upwith anything better. The problem with what you posted is that the OPwas looking to preserve the accented characters, NOT replace them. Allhe wants to do is wrap some tags around the search terms so that theyare highlighted. I guess he could use your function to replace all theaccented characters with regular ones in a copy of the originalstring, and then scan that string using str_pos() or similar againstthe copy to find the index of each occurrence that needs replaced inthe original string. This seems even less efficient than the regularexpressions, to me.
Andrew

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux