Re: How to boost performance of queries containing pattern matching characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>How can we boost performance of queries containing pattern matching 
>characters?  In my case, we're using a percent sign (%) that matches any
string of  zero or more characters.
>
> QUERY:  DELETE FROM MYTABLE WHERE EMAIL ILIKE '%domain.com%'
>
> EMAIL column is VARCHAR(256).
> 
> As it is clear from the above query, email is matched "partially and
case-insensitively", which my application requirement demands.  
> 
> In case, if it were a full match, I could easily define a functional 
> INDEX on EMAIL column (lower(EMAIL)) and I could rewrite my DELETE where
criteria like lower(EMAIL) = 'someemail@xxxxxxxxxx'.
> 
> MYTABLE currently contains 2 million records and grows consistently.

I had almost the same problem.
To resolve it, I created my own text search parser (myftscfg) which divides
text in column into three letters parts, for example:

someemail@xxxxxxxxxx is divided to som, ome,mee,eem,ema,mai,ail,il@,
l@d,@do,dom,oma,mai,ain,in.,n.c,.co,com

There should be also index on email column:

CREATE INDEX "email _fts" on mytable using gin
(to_tsvector('myftscfg'::regconfig, email))

Every query like email ilike '%domain.com%' should be rewrited to:

WHERE
to_tsvector('myftscfg',email) @@ to_tsquery('dom') AND
to_tsvector('myftscfg',email) @@ to_tsquery('oma') AND
to_tsvector('myftscfg',email) @@ to_tsquery('mai') AND
to_tsvector('myftscfg',email) @@ to_tsquery('ain') AND
to_tsvector('myftscfg',email) @@ to_tsquery('in.') AND
to_tsvector('myftscfg',email) @@ to_tsquery('n.c') AND
to_tsvector('myftscfg',email) @@ to_tsquery('.co') AND
to_tsvector('myftscfg',email) @@ to_tsquery('com') AND email ILIKE
'%domain.com%';

Index is reducing number of records and clause email ILIKE '%domain.com%' is
selecting only valid records.

I didn't found better solution.

-------------------------------------------
Artur Zajac


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux