Re: case and accent - insensitive regular expression?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 14, 2008 at 11:06 AM, Giulio Mastrosanti<giulio@xxxxxxxxxxxxx> wrote:>>>>>> First of all thank you all for your answers, and thank you for your time>> and yes Tedd, my question was quite ambiguous in that point.>> Andrew is right, i don't want to change in any way the list of keys I show> in the result, I just want to find the way to higlight the matching words,> regardless of their accent variations.>> So I think his Andrew's suggestion could be a good solution, and I'll try it> ASAP...>> let me se if i correctly understood:>> $search = preg_quote($word); -- quotes chars that could be intrepreted like> regex special chars>> $search = str_replace('e', '[eטיךכ]', $search);  --  trasforms i.e. cafe in> caf[eטיךכ], so matches all the accented variations>> return preg_replace('/\b' ...  -- replaces all the occurences adding the> tags, you use \b as word boundary, right?
Yes, yes, and yes. :-)

> it seems a fine soultion to the problem!>> the only thing i must add is, befor calling highlight_search_terms, to> 'normalize' the word string ( the word used for the search) to transform it> removing the accentated versions of the chars:>> $word = preg_replace('[טי]{1}','e',$word);> $word = preg_replace('[א]{1}','a',$word);>> that because also the search string could contain an accented char, and this> way I avoid to perform str_replace in the highlight_search_terms function> for every combination of accented chars
I was intrigued by your example, so I played around with it some morethis morning. My own quick web search yielded a lot of results forhighlighting search terms, but none that I found did what you'reafter. (I admit I didn't look very deep.) I was up to something likethis before your reply came in. It's still by no means complete. Iteven handles simple English plurals (words ending in 's' or 'es'), butnot variations that require changing the word base (like 'daisy' to'daisies').
<?phpfunction highlight_search_terms($phrase, $string) {    $non_letter_chars = '/[^\pL]/iu';    $words = preg_split($non_letter_chars, $phrase);
    $search_words = array();    foreach ($words as $word) {        if (strlen($word) > 2 && !preg_match($non_letter_chars, $word)) {            $search_words[] = $word;        }    }
    $search_words = array_unique($search_words);
    foreach ($search_words as $word) {        $search = preg_quote($word);
        /* repeat for each possible accented character */        $search = preg_replace('/(ae|æ|ǽ)/iu', '(ae|æ|ǽ)', $search);        $search = preg_replace('/(oe|œ)/iu', '(oe|œ)', $search);        $search = preg_replace('/[aàáâãäåǻāăą](?!e)/iu','[aàáâãäåǻāăą]', $search);        $search = preg_replace('/[cçćĉċč]/iu', '[cçćĉċč]', $search);        $search = preg_replace('/[dďđ]/iu', '[dďđ]', $search);        $search = preg_replace('/(?<![ao])[eèéêëēĕėęě]/iu','[eèéêëēĕėęě]', $search);        $search = preg_replace('/[gĝğġģ]/iu', '[gĝğġģ]', $search);        $search = preg_replace('/[hĥħ]/iu', '[hĥħ]', $search);        $search = preg_replace('/[iìíîïĩīĭįı]/iu', '[iìíîïĩīĭįı]', $search);        $search = preg_replace('/[jĵ]/iu', '[jĵ]', $search);        $search = preg_replace('/[kķĸ]/iu', '[kķĸ]', $search);        $search = preg_replace('/[lĺļľŀł]/iu', '[lĺļľŀł]', $search);        $search = preg_replace('/[nñńņňʼnŋ]/iu', '[nñńņňʼnŋ]', $search);        $search = preg_replace('/[oòóôõöōŏőǿơ](?!e)/iu','[oòóôõöōŏőǿơ]', $search);        $search = preg_replace('/[rŕŗř]/iu', '[rŕŗř]', $search);        $search = preg_replace('/[sśŝşš]/iu', '[sśŝşš]', $search);        $search = preg_replace('/[tţťŧ]/iu', '[tţťŧ]', $search);        $search = preg_replace('/[uùúûüũūŭůűųǔǖǘǚǜ]/iu','[uùúûüũūŭůűųǔǖǘǚǜ]', $search);        $search = preg_replace('/[wŵ]/iu', '[wŵ]', $search);        $search = preg_replace('/[yýÿŷ]/iu', '[yýÿŷ]', $search);        $search = preg_replace('/[zźżž]/iu', '[zźżž]', $search);

        $string = preg_replace('/\b' . $search . '(e?s)?\b/iu', '<spanclass="keysearch">$0</span>', $string);    }
    return $string;
}?>
I still can't help feeling there must be some better way, though.
>> well, i think I'm on the good way now, unfortunately I have some other> urgent work and can't try it immediately, but I'll let you know    :)>> thank you!>>     Giulio

Andrew

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux