Search Postgresql Archives

Re: how to improve this similarity query?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Look at doc chapter II.12

2011/9/30, hamann.w@xxxxxxxxxxx <hamann.w@xxxxxxxxxxx>:
> Hi,
>
> I have a table of names, and searches are usually performed on prefix match.
> This could nicely translate into an index search
> Suppose first name is stored as either 'Jim' or 'Jimmy', searching
> ... where firstname ~* '^jim';
> gets proper result. I had hoped that creating a functional
> index on lower(firstname) and using a query like
> .... where lower(firstname) ~ '^jim'
> would improve the search, but it does not.
> I ended up with adding a lowercased column for matching
>
> Now a few names (a few percent of the entire lot) have alternates, like
> 'James'.
> These could be nicknames, informal variants, language variants, alternate
> spellings
>
> I have already split off these few percent into a separate table and can
> query that like
> ... where 'jim' ~* firstname_pattern;'jim' ~* firstname_pattern;'jim' ~*
> firstname_pattern;
>
> There are two problems with this approach: when I use 'Jimbo' for the plain
> query,
> I do not get any 'Jim' or 'Jimmy', as expected. To achieve the same effect
> on the pattern
> query, I need to "decorate" the pattern somewhat. Actually, when I specify
> 'J(im|ames)' for
> the pattern, it gets preprocessed - and is stored in the database as -
> 'J(im|am($|e($|s)))$'
> Unfortunately there are regex patterns which the preprocessing script cannot
> handle, so
> I might have to try a different regex.
> The other, bigger, problem: the search cannot make use of  an index, and it
> has to compile
> a regex for every entry in the table. I am considering a change to that
> part: in the Jim/James
> case it is obvious that I could speed up the query with
> .... where firstname_pattern ~* '^j' and 'jim' ~* firstname_pattern;
> If the pattern was 'Bob|Robert' instead, I would have to change the
> preprocessing so the
> 'B' and 'R' parts would be separate.
>
> So, I wonder whether there is any better way of doing these. I have looked
> into tsquery
> resp. fulltext, but they do not seem to support prefix matches, only exact
> ones.
>
> Regards
> Wolfgang Hamann
>
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


-- 
------------
pasman

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux