Scott,
My first thought on the query where a pattern being faster than the query
with an exact value is that the planner does not have good enough
statistics on that column. Without looking at the explain plans further, I
would suggest trying something simple. The fact that it is fasster on 8.3
but slower on 8.1 may have to do with changes between versions, or may
simply be due to luck in the statistics sampling.
See if increasing the statistics target on that column significantly does
anything:
EXPLAIN (your query);
ALTER TABLE orders_products ALTER COLUMN product_id SET STATISTICS 2000;
ANALYZE orders_products;
EXPLAIN (your query);
2000 is simply a guess of mine for a value much larger than the default.
This will generally make query planning slower but the system will have a
lot more data about that column and the distribution of data in it. This
should help stabilize the query performance.
If this has an effect, the query plans will change.
Your question below really boils down to something more simple:
--Why is the most optimal query plan not chosen? This is usually due to
either insufficient statistics or quirks in how the query planner works on
a specific data >set or with certain configuration options.
Thank you very much.
I found that AND dok.kuupaev = date'2008-11-21' runs fast but
AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21' runs very
slow.
explain SELECT sum(1)
FROM dok JOIN rid USING (dokumnr)
JOIN toode USING (toode)
WHERE rid.toode like '99000010%'
plan with default statistics:
"Aggregate (cost=17.86..17.87 rows=1 width=0)"
" -> Nested Loop (cost=0.00..17.85 rows=1 width=0)"
" -> Nested Loop (cost=0.00..11.84 rows=1 width=24)"
" Join Filter: ("outer".dokumnr = "inner".dokumnr)"
" -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..5.81
rows=1 width=4)"
" Index Cond: ((kuupaev >= '2008-11-21'::date) AND
(kuupaev <= '2008-11-21'::date))"
" -> Index Scan using rid_toode_pattern_idx on rid
(cost=0.00..6.01 rows=1 width=28)"
" Index Cond: ((toode ~>=~ '99000010'::bpchar) AND (toode
~<~ '99000011'::bpchar))"
" Filter: (toode ~~ '99000010%'::text)"
" -> Index Scan using toode_pkey on toode (cost=0.00..6.00 rows=1
width=24)"
" Index Cond: ("outer".toode = toode.toode)"
after statistics is changed query runs fast ( 70 ... 1000 ms)
ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000;
analyze rid;
explain analyze SELECT sum(1)
FROM dok JOIN rid USING (dokumnr)
JOIN toode USING (toode)
WHERE rid.toode like '99000010%'
AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'
"Aggregate (cost=27.04..27.05 rows=1 width=0) (actual time=44.830..44.834
rows=1 loops=1)"
" -> Nested Loop (cost=0.00..27.04 rows=1 width=0) (actual
time=0.727..44.370 rows=108 loops=1)"
" -> Nested Loop (cost=0.00..21.02 rows=1 width=24) (actual
time=0.688..40.519 rows=108 loops=1)"
" -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..5.81
rows=1 width=4) (actual time=0.027..8.094 rows=1678 loops=1)"
" Index Cond: ((kuupaev >= '2008-11-21'::date) AND
(kuupaev <= '2008-11-21'::date))"
" -> Index Scan using rid_dokumnr_idx on rid
(cost=0.00..15.20 rows=1 width=28) (actual time=0.011..0.011 rows=0
loops=1678)"
" Index Cond: ("outer".dokumnr = rid.dokumnr)"
" Filter: (toode ~~ '99000010%'::text)"
" -> Index Scan using toode_pkey on toode (cost=0.00..6.00 rows=1
width=24) (actual time=0.016..0.020 rows=1 loops=108)"
" Index Cond: ("outer".toode = toode.toode)"
"Total runtime: 45.050 ms"
It seems that you are genius.
I used 1000 since doc wrote that max value is 1000
Rid table contains 3.5millions rows, will increase 1 millions of rows per
year and is updated frequently, mostly by adding.
Is it OK to leave
SET STATISTICS 1000;
setting for this table this column or should I try to decrease it ?
Andrus.
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance