Re: Increasing pattern index query speed

"Andrus" <kobruleht2@xxxxxx> · Wed, 26 Nov 2008 21:24:48 +0200

Scott,

My first thought on the query where a pattern being faster than the query 
with an exact value is that the planner does not have good enough 
statistics on that column.  Without looking at the explain plans further, I 
would suggest trying something simple.  The fact that it is fasster on 8.3 
but slower on 8.1 may have to do with changes between versions, or may 
simply be due to luck in the statistics sampling.
See if increasing the statistics target on that column significantly does 
anything:
EXPLAIN (your query);
ALTER TABLE orders_products ALTER COLUMN product_id SET STATISTICS 2000;
ANALYZE orders_products;
EXPLAIN (your query);
2000 is simply a guess of mine for a value much larger than the default. 
This will generally make query planning slower but the system will have a 
lot more data about that column and the distribution of data in it.  This 
should help stabilize the query performance.
If this has an effect, the query plans will change.
Your question below really boils down to something more simple:
--Why is the most optimal query plan not chosen?  This is usually due to 
either insufficient statistics or quirks in how the query planner works on 
a specific data >set or with certain configuration options.

Thank you very much.
I found that  AND dok.kuupaev = date'2008-11-21' runs fast but
AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'  runs very 
slow.

explain SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode like '99000010%'

plan with default statistics:

"Aggregate  (cost=17.86..17.87 rows=1 width=0)"
"  ->  Nested Loop  (cost=0.00..17.85 rows=1 width=0)"
"        ->  Nested Loop  (cost=0.00..11.84 rows=1 width=24)"
"              Join Filter: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..5.81 
rows=1 width=4)"
"                    Index Cond: ((kuupaev >= '2008-11-21'::date) AND 
(kuupaev <= '2008-11-21'::date))"
"              ->  Index Scan using rid_toode_pattern_idx on rid 
(cost=0.00..6.01 rows=1 width=28)"
"                    Index Cond: ((toode ~>=~ '99000010'::bpchar) AND (toode 
~<~ '99000011'::bpchar))"
"                    Filter: (toode ~~ '99000010%'::text)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.00 rows=1 
width=24)"
"              Index Cond: ("outer".toode = toode.toode)"

after statistics is changed query runs fast ( 70 ... 1000 ms)

ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000;
analyze rid;
explain analyze SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode like '99000010%'
  AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'
"Aggregate  (cost=27.04..27.05 rows=1 width=0) (actual time=44.830..44.834 
rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..27.04 rows=1 width=0) (actual 
time=0.727..44.370 rows=108 loops=1)"
"        ->  Nested Loop  (cost=0.00..21.02 rows=1 width=24) (actual 
time=0.688..40.519 rows=108 loops=1)"
"              ->  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..5.81 
rows=1 width=4) (actual time=0.027..8.094 rows=1678 loops=1)"
"                    Index Cond: ((kuupaev >= '2008-11-21'::date) AND 
(kuupaev <= '2008-11-21'::date))"
"              ->  Index Scan using rid_dokumnr_idx on rid 
(cost=0.00..15.20 rows=1 width=28) (actual time=0.011..0.011 rows=0 
loops=1678)"
"                    Index Cond: ("outer".dokumnr = rid.dokumnr)"
"                    Filter: (toode ~~ '99000010%'::text)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.00 rows=1 
width=24) (actual time=0.016..0.020 rows=1 loops=108)"
"              Index Cond: ("outer".toode = toode.toode)"
"Total runtime: 45.050 ms"

It seems that you are genius.

I used 1000 since doc wrote that max value is 1000

Rid table contains 3.5millions rows, will increase 1 millions of rows per 
year and is updated frequently, mostly by adding.

Is it OK to leave

SET STATISTICS 1000;

setting for this table this column or should  I try to decrease it ?

Andrus. 

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance