Re: Replacing accented characters by non-accented characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le lundi 12 mai 2008 à 19:07 +0300, Dotan Cohen a écrit :
> 2008/5/12 Yannick Warnier <ywarnier@xxxxxxxxxxxx>:
> > Hello,
> >
> > I've been trying to find something nice to transform an accentuated
> > string into a non-accentuated string. Obviously, I'm mostly playing
> > inside the European languages, but any method that could transform
> > arabic or asian characters to plain non-accentuated characters would be
> > perfect.
> >
> > I have found a number of solutions, ranging from str_replace() for every
> > known accentuated character to strtr() to a preg_replace() of a
> > conversion of the string to html characters then removing the "&" and
> > the "alteration" string (acute, grave, circ, ...).
> >
> > I must say the last one seems to work better because it's less affected
> > by charset changes, but it still seems awfully slow to me and I would
> > like to know if there is any function that exists that could do that for
> > me?
> >
> > Yannick
> >
> 
> Why are you removing the accents? Why not store/process the data as
> UTF-8, which supports all the accents in all the languages, and even
> non-latin languages. You mention Arabic, which does not use accented
> latin characters (Maybe you are thinking of Turkish, Ubek or Tadjic).
> UTF-8 supports Arabic, Russian, Greek, Latin including modified
> accented letters, and almost everything else save CJK.
> 
> What is your end goal? Why are you removing the accents?

Hi Dotan,

I'm trying to give a universally-manageable directory name to an item
using a free-text title. I want to avoid every type of accentuated
character and everything outside of pure ASCII to make it the most
portable possible.
Generating a random hash is not acceptable as we want to be the most
user-friendly possible.

I'm talking about Arabic not to remove accentuated characters, but in
case there would be a transliteration function that allows me to turn an
Arabic character into something similar in terms of pronunciation but in
ASCII.

So the goal is to create a directory name that is both intuitive *and*
portable for the user and the admin. The title is kept for the user, but
there is a generic shortened code that is generated following the given
title.
We used to ask for a title in a webform, but realised our users liked it
much better when we give them the possibility to generate the code
themselves, but generating one ourselves by default.
I just realised that the developer who did it seemed to make it using
html codes directly, so we end up with codes like "EACUTETEACUTE" for an
item called "été", while "ETE" would be far better.

Yannick


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux