Search Postgresql Archives

Re: Searching for "bare" letters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Reuven M. Lerner wrote:

>>     <p>Hi, everyone.  I'm working on a project on PostgreSQL 9.0 (soon
>>       to be upgraded to 9.1, given that we haven't yet launched).  The
>>       project will involve numerous text fields containing English,
>>       Spanish, and Portuguese.  Some of those text fields will be
>>       searchable by the user.  That's easy enough to do; for our
>>       purposes, I was planning to use some combination of LIKE searches;
>>       the database is small enough that this doesn't take very much
>>       time, and we don't expect the number of searchable records (or
>>       columns within those records) to be all that large.</p>
>>     <p>The thing is, the people running the site want searches to work
>>       on what I'm calling (for lack of a better term) "bare" letters. 
>>       That is, if the user searches for "n", then the search should also
>>       match Spanish words containing "ñ".  I'm told by Spanish-speaking
>>       members of the team that this is how they would expect searches to
>>       work.  However, when I just did a quick test using a UTF-8 encoded
>>       9.0 database, I found that PostgreSQL didn't  see the two
>>       characters as identical.  (I must say, this is the behavior that I
>>       would have expected, had the Spanish-speaking team member not said
>>       anything on the subject.)</p>
>>     <p>So my question is whether I can somehow wrangle PostgreSQL into
>>       thinking that "n" and "ñ" are the same character for search
>>       purposes, or if I need to do something else -- use regexps, keep a
>>       "naked," searchable version of each column alongside the native
>>       one, or something else entirely -- to get this to work.</p>
>>     <p>Any ideas?</p>
>>     <p>Thanks,</p>
>>     <p>Reuven<br>

I had the same problem with german (there is ä ö ü)
I ended up with a normalized version of the database (for many purposes, this could
be just an extra column) plus preprocessing the input.
There is one difficulty with german searches: these letters are commonly transliterated into
ue etc, like in "Muenchen". So depending on culture, some people would expect a "u" search
term to match, and others the "ue". So preprocessing query means replacing bare u
(not followed by e) with a ue? regex

BTW: if your search form does not explicitly tell the browser to use utf8 to encode the search field,
you might expect a small proportion of iso-latin1 requests

Regards
Wolfgang




-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux