Re: LIKE and REGEX optimization

Chris Browne <cbbrowne@xxxxxxx> · Tue, 15 Jan 2008 15:29:49 -0500

Kico Zaninetti <kicozaninetti@xxxxxxxxx> writes:
> Hi all.
>
> I have a database with 62 million registers and I have to make a
> SELECT using LIKE.
>
> This is my select:
> SELECT * FROM phone WHERE name LIKE = '%ZANINETTI%' AND city = 'SAO
> PAULO' AND state = 'SP'
>
> I have an index created like this:
> CREATE INDEX "telefones_idx2" ON "public"."phone"
>   USING btree ("name" varchar_pattern_ops, "city", "state");
>
> When I explain the query I have this:
> QUERY PLAN
> Bitmap Heap Scan on telefones  (cost=1031528.27..2726942.75 rows=4
> width=145)
>   Recheck Cond: (((city)::text = 'SAO PAULO'::text) AND ((state)::text
> = 'SP'::text))
>   Filter: ((name)::text ~~ '%ZANINETTI%'::text)
>   ->  Bitmap Index Scan on telefones_idx2  (cost=0.00..1031528.27
> rows=1712760 width=0)
>         Index Cond: (((city)::text = 'SAO PAULO'::text) AND
> ((state)::text = 'SP'::text))
>
>
> The cost is over than 1 million! It's to high and I have to reduce it.
> Does someone know how can I make it?

You may be misreading what the cost is...

It looks to me like the planner is estimating that there are 1712760
rows where city = 'SAO PAULO' and state = 'SP'; is that estimate way
off?

If it is, then maybe you need to ANALYZE the table, perhaps combined
with altering the histogram size for city/state.

(e.g. - alter table phone alter column city set statistics 100;
        alter table phone alter column state set statistics 100;)

If altering statistics, then re-ANALYZEing the table helps, that may
resolve things.

Unfortunately, the LIKE clause isn't going to readily take advantage
of regular indices, because you have put in '%' at beginning and end,
which means that regular indices will not be usable.

I wouldn't be too surprised if that is the case; Sao Paulo is about
the most populous South American city, with ~11M people, so having a
database with 1.7M phone numbers in that city does not seem "out
there."

The only idea that comes to mind to follow that is to look into
tsearch.  On PostgreSQL 8.3, it's built in; on 8.2 and earlier, it's a
"contrib" module.  It allows indexing on words inside columns, which
would seem to fit your requirement.
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/sap.html
Signs  of  a  Klingon  Programmer #3:  "By   filing this  TPR you have
challenged the honor of my family. Prepare to die!"

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend