Reuven M. Lerner wrote: >> <p>Hi, everyone. I'm working on a project on PostgreSQL 9.0 (soon >> to be upgraded to 9.1, given that we haven't yet launched). The >> project will involve numerous text fields containing English, >> Spanish, and Portuguese. Some of those text fields will be >> searchable by the user. That's easy enough to do; for our >> purposes, I was planning to use some combination of LIKE searches; >> the database is small enough that this doesn't take very much >> time, and we don't expect the number of searchable records (or >> columns within those records) to be all that large.</p> >> <p>The thing is, the people running the site want searches to work >> on what I'm calling (for lack of a better term) "bare" letters. >> That is, if the user searches for "n", then the search should also >> match Spanish words containing "ñ". I'm told by Spanish-speaking >> members of the team that this is how they would expect searches to >> work. However, when I just did a quick test using a UTF-8 encoded >> 9.0 database, I found that PostgreSQL didn't see the two >> characters as identical. (I must say, this is the behavior that I >> would have expected, had the Spanish-speaking team member not said >> anything on the subject.)</p> >> <p>So my question is whether I can somehow wrangle PostgreSQL into >> thinking that "n" and "ñ" are the same character for search >> purposes, or if I need to do something else -- use regexps, keep a >> "naked," searchable version of each column alongside the native >> one, or something else entirely -- to get this to work.</p> >> <p>Any ideas?</p> >> <p>Thanks,</p> >> <p>Reuven<br> I had the same problem with german (there is ä ö ü) I ended up with a normalized version of the database (for many purposes, this could be just an extra column) plus preprocessing the input. There is one difficulty with german searches: these letters are commonly transliterated into ue etc, like in "Muenchen". So depending on culture, some people would expect a "u" search term to match, and others the "ue". So preprocessing query means replacing bare u (not followed by e) with a ue? regex BTW: if your search form does not explicitly tell the browser to use utf8 to encode the search field, you might expect a small proportion of iso-latin1 requests Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general