24 jul 2007 kl. 16:10 skrev Tom Lane:
Henrik Zagerholm <henke@xxxxxx> writes:
I'm using pgsql 8.2.4 and I have this query which is sooo slow but I
can seem to figure out why.
Seems the core of the problem is the misestimation here:
" -> Index Scan using
tbl_file_idx on tbl_file (cost=0.01..8.34 rows=1 width=39) (actual
time=0.283..3339.003 rows=25039 loops=1)"
" Index Cond:
((lower
((file_name)::text) ~>=~ 'awstats'::character varying) AND (lower
((file_name)::text) ~<~ 'awstatt'::character varying))"
" Filter: (lower
((file_name)::text) ~~ 'awstats%'::text)"
Looks like either your stats are not up to date on this table, or you
need to increase the stats target for file_name. Anytime a scan row
estimate is off by a factor of 25000, you're going to get a bad
plan :-(
Thanks for the tips on getting this query faster.
What I did was.
Changed statistics on both file_name and tbl_file_idx.pg_expression_1
which made the stats look much better. A simple query test on
tbl_file showed that the the new stats worked.
EXPLAIN ANALYZE SELECT file_name FROM tbl_file WHERE lower(file_name)
LIKE lower('awstats%');
"Bitmap Heap Scan on tbl_file (cost=1157.12..11999.34 rows=39868
width=19) (actual time=49.751..258.550 rows=39959 loops=1)"
" Filter: (lower((file_name)::text) ~~ 'awstats%'::text)"
" -> Bitmap Index Scan on tbl_file_idx (cost=0.00..1147.15
rows=35881 width=0) (actual time=47.791..47.791 rows=39959 loops=1)"
" Index Cond: ((lower((file_name)::text) ~>=~
'awstats'::character varying) AND (lower((file_name)::text) ~<~
'awstatt'::character varying))"
"Total runtime: 308.610 ms"
I then added another table which as a many to one relationship with
tbl_file called tbl_file_structure. This is the one I want to do the
count on.
Surprisingly this query is about 100 times slower due to the SEQ scan
on tbl_file_structure. The foreign key fk_file_id is indexed and it
has statistics set to 200.
How come it makes an seq scan on this table? The count retrieved is
75 000 so its way lower than the total 3 834 059 rows.
Should I raise the statistics more? Is there a rule of thumb how much
the statistics should be reagards to the number of rows in the table?
Can I make my database adjust the statistics dynamically? I don't
want to go around to my customers changing statistics every time the
tables starts to fill up.
Anyway here is the slow query.
EXPLAIN ANALYZE SELECT COUNT(pk_file_structure_id) FROM
tbl_file_structure INNER JOIN tbl_file ON fk_file_id = pk_file_id
WHERE lower(file_name) like lower('awstats%');
"Aggregate (cost=172512.17..172512.18 rows=1 width=8) (actual
time=30316.316..30316.317 rows=1 loops=1)"
" -> Hash Join (cost=12673.69..171634.39 rows=351110 width=8)
(actual time=1927.730..30191.260 rows=75262 loops=1)"
" Hash Cond: (tbl_file_structure.fk_file_id =
tbl_file.pk_file_id)"
" -> Seq Scan on tbl_file_structure (cost=0.00..80537.59
rows=3834059 width=16) (actual time=10.056..14419.662 rows=3834059
loops=1)"
" -> Hash (cost=11999.34..11999.34 rows=39868 width=8)
(actual time=1896.859..1896.859 rows=39959 loops=1)"
" -> Bitmap Heap Scan on tbl_file
(cost=1157.12..11999.34 rows=39868 width=8) (actual
time=457.867..1779.792 rows=39959 loops=1)"
" Filter: (lower((file_name)::text) ~~ 'awstats
%'::text)"
" -> Bitmap Index Scan on tbl_file_idx
(cost=0.00..1147.15 rows=35881 width=0) (actual time=450.469..450.469
rows=39959 loops=1)"
" Index Cond: ((lower((file_name)::text)
~>=~ 'awstats'::character varying) AND (lower((file_name)::text) ~<~
'awstatt'::character varying))"
"Total runtime: 30316.739 ms"
Thanks for your help,
Henrik
regards, tom lane
---------------------------(end of
broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match