On fre, 2012-04-20 at 09:15 +0100, Thom Brown wrote: > I had a look at the unaccent.rules file and noticed the following > characters aren't properly converted: > > ß (U+00DF) An eszett represents a double-s "SS" but this replaces it > with one "S". Shouldn't this be replace with "SS"? Probably, but it certainly shouldn't be upper case. > Æ (U+00C6) and æ (U+00E6) These doesn't have an accent, diacritic or > anything added to a single latin character. It's simply a ligature of > "A" and "E" or "a" and "e". If someone has the text "æther", I would > imagine they'd be surprised at it being converted to "ather" instead > of "aether". It depends on what the point of this module is supposed to be. Doing "unaccenting" usefully depends on language and context. For example, it would be very reasonable to map æ to ae, but in a Scandinavian context, æ is equivalent to ä, which is mapped to a, which is itself questionable. > Œ (U+0152) and œ (U+0153). Same as above. This is a ligature of "O" > and "E" or "o" and "e". Except this time the unaccent module chooses > the 2nd character instead of the 1st which is confusing. That certainly seems wrong. It's also worth noting that while æ is in some languages considered a separate letter, œ is generally just a typographical ligature. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general