In article <20060327094829.GA30791@xxxxxxxxx>, kleptog@xxxxxxxxx says... > On Mon, Mar 27, 2006 at 11:31:17AM +0200, SunWuKung wrote: > > I would need to do case insensitive match against a field that contains > > text in different languages - Greek, Hungarian, Arabic etc. > > The db encoding is UTF8. > > > > So far I found no way to achieve that. I tried converting both strings > > to the same case and using ~* , but neither worked. > > Oh, tricky. Firstly, case-insensetive means different things to > different locales. For example, in Turkish 'i' is not the lowecase > version of 'I'. Maybe you've chosen a locale that doesn't do UTF-8? You > don't specify a platform either. Locale support varies wildly by > platform. > > What you probably want it some kind of accent-insensetive match that > mean that é, è, ë, e, É, È, E and Ë are all considered to match > eachother. The way you do that is by converting unicode to a particular > normal form and then comparing. Unfortunatly, I don't think PostgreSQL > supplies such a function right now. > > However, some server-side procedural languages can do this. If you can > find one (possibly Perl) that can do the conversion, you can create a > function to do the mapping. > > Have a nice day, > This sounds like a very interesting concept. It wouldn't be 'case insensitive' just insensitive. The way I imagine it now is a special case of the ~ function. I create matchgroups in a table and check each character if it is in the group. If it is I will replace the character with the group in [éÉE], [oóOÓ??] and do a regexp with that. What do you think? B.