Re: Relative performance of prefix and suffix string matching

"Stéphane A. Schildknecht" <stephane.schildknecht@xxxxxxxxxxxxx> · Fri, 23 Sep 2011 12:53:55 +0200

Le 23/09/2011 12:30, Alban Hertroys a écrit :
> On 23 September 2011 11:47, Andrew Rose <andrew.rose@xxxxxxxxxxxxxx
> <mailto:andrew.rose@xxxxxxxxxxxxxx>> wrote:
> 
>     Basic Question: In text fields, is prefix matching significantly faster
>     than suffix matching?
> 
> 
> It does depend on what type of index you use. BTrees split off text strings,
> from left to right, halving the number of records you need to scan at every
> branch. For a suffix match, that's exactly the wrong way around.
> Hash indexes probably don't fare any better.
> I don't know enough about GIST or GIN indexes to comment on their suitability
> for suffix matches, but presumably they're better at those.
> 
> I recall doing suffix matches used to be a problem in at least earlier versions
> of Postgres, but it's quite possible that the query planner is smart enough to
> do the reverse match by itself nowadays (I doubt it, seeing that it would also
> need to reverse the way the index is organised).
>  
> 
>     2. Alternatively, I could store column 'rev_str' as a reversed version of
>     column 'str' and have the client produce a reversed version of x on each
>     query (call it r).  Then the client would issue...
> 
>     SELECT * FROM tbl WHERE str LIKE 'x%' OR rev_str LIKE 'r%'
> 
>     ...which would use prefix matches only instead of requiring suffix matches.
>      Since I've seen this form used by others, I was wondering if it's
>     necessary - i.e. if databases really do perform prefix matching faster?
> 
>     3. Is there a solution I'm unaware of with even better performance?
> 
>  
> You can create a functional index on the reverse of the string, that way
> omitting the need for an extra column (that needs updating as well).
> 
> CREATE INDEX tbl_str_rev ON tbl (reverse(str));
> SELECT * FROM tbl WHERE str LIKE 'x%' OR reverse(str) LIKE 'x%';
> 
> See: http://www.postgresql.org/docs/9.0/static/indexes-expressional.html

You can use the pg_trgm extension which lets you create gin or gist indexes on
your text field.

There is also wildspeed (see http://www.sai.msu.su/~megera/wiki/wildspeed).

Didn't try the latter solution, but the first one gives really great result for
searching partial strings.

a propos, there's one thing I'd like to know, is how to set the similarity
limit within pg_trgm on a server side (I'd like to have it settled to 0.2 for
every new session, for instance).

Regards,
-- 
Stéphane Schildknecht
http://www.loxodata.com
Contact régional PostgreSQL

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general