Search Postgresql Archives

how to improve this similarity query?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have a table of names, and searches are usually performed on prefix match.
This could nicely translate into an index search
Suppose first name is stored as either 'Jim' or 'Jimmy', searching 
... where firstname ~* '^jim';
gets proper result. I had hoped that creating a functional
index on lower(firstname) and using a query like
.... where lower(firstname) ~ '^jim'
would improve the search, but it does not.
I ended up with adding a lowercased column for matching

Now a few names (a few percent of the entire lot) have alternates, like 'James'.
These could be nicknames, informal variants, language variants, alternate spellings

I have already split off these few percent into a separate table and can query that like
... where 'jim' ~* firstname_pattern;'jim' ~* firstname_pattern;'jim' ~* firstname_pattern;

There are two problems with this approach: when I use 'Jimbo' for the plain query,
I do not get any 'Jim' or 'Jimmy', as expected. To achieve the same effect on the pattern
query, I need to "decorate" the pattern somewhat. Actually, when I specify 'J(im|ames)' for
the pattern, it gets preprocessed - and is stored in the database as - 'J(im|am($|e($|s)))$'
Unfortunately there are regex patterns which the preprocessing script cannot handle, so
I might have to try a different regex.
The other, bigger, problem: the search cannot make use of  an index, and it has to compile
a regex for every entry in the table. I am considering a change to that part: in the Jim/James
case it is obvious that I could speed up the query with
.... where firstname_pattern ~* '^j' and 'jim' ~* firstname_pattern;
If the pattern was 'Bob|Robert' instead, I would have to change the preprocessing so the
'B' and 'R' parts would be separate.

So, I wonder whether there is any better way of doing these. I have looked into tsquery
resp. fulltext, but they do not seem to support prefix matches, only exact ones.

Regards
Wolfgang Hamann

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux