Search Postgresql Archives

Re: Unaccent characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On fre, 2012-04-20 at 09:15 +0100, Thom Brown wrote:
> I had a look at the unaccent.rules file and noticed the following
> characters aren't properly converted:
> 
> ß (U+00DF)  An eszett represents a double-s "SS" but this replaces it
> with one "S".  Shouldn't this be replace with "SS"?

Probably, but it certainly shouldn't be upper case.

> Æ (U+00C6) and æ (U+00E6) These doesn't have an accent, diacritic or
> anything added to a single latin character.  It's simply a ligature of
> "A" and "E" or "a" and "e".  If someone has the text "æther", I would
> imagine they'd be surprised at it being converted to "ather" instead
> of "aether".

It depends on what the point of this module is supposed to be.  Doing
"unaccenting" usefully depends on language and context.  For example, it
would be very reasonable to map æ to ae, but in a Scandinavian context,
æ is equivalent to ä, which is mapped to a, which is itself
questionable.

> Œ (U+0152) and œ (U+0153). Same as above.  This is a ligature of "O"
> and "E" or "o" and "e".  Except this time the unaccent module chooses
> the 2nd character instead of the 1st which is confusing.

That certainly seems wrong.  It's also worth noting that while æ is in
some languages considered a separate letter, œ is generally just a
typographical ligature.


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux