Re: Query not using index pgsql 8.2.3

Henrik Zagerholm <henke@xxxxxx> · Fri, 23 Mar 2007 14:13:57 +0100

Never mind my last post regarding the Bitmap heap scan. That is all  
in order.

Now it works perfectly. Went from 30+ seconds to 179ms. =)

I'm just amazed that setting the statistics threshold on one column  
mad all the difference. IS there any guidelines on what columns I  
should change the statistics on?

Thanks again!!!

23 mar 2007 kl. 14:04 skrev Henrik Zagerholm:

23 mar 2007 kl. 13:34 skrev Michael Fuhr:

On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote:
23 mar 2007 kl. 12:33 skrev Michael Fuhr:
The row count estimate for fk_filetype_id = 83 is high by an order
of magnitude:

Bitmap Index Scan on tbl_file_idx6  (cost=0.00..25.65 rows=1251
width=0) (actual time=21.958..21.958 rows=112 loops=1)

Index Cond: (fk_filetype_id = 83)

Have you run ANALYZE or VACUUM ANALYZE on these tables recently?
If so then you might try increasing the statistics target for
tbl_file.fk_filetype_id and perhaps some of the columns in the join
conditions.

I did a vacuum full and reindex on all tables.

VACUUM FULL is seldom (if ever) necessary if you're running plain
VACUUM (without FULL) often enough, either manually or via  
autovacuum.

Now I also did a vacuum analyze on tbl_acl (the biggest table with
about 4.5 millin rows)

Same result.

I'd suggest analyzing all tables.  The bad estimate I mentioned
appears to be for a column in tbl_file so if you didn't analyze
that table then the query plan probably won't improve.

But do you mean if the row_count estimate is big it can't use any
index on any other table within the JOINs?

High row count estimates make the planner think that scanning entire
tables would be faster than using indexes.  The more of a table a
query must fetch the less efficient an index scan becomes, to the
point that a sequential scan is faster than an index scan.

Any specific parameters I should adjust?

If analyzing the tables doesn't improve the row count estimates
then try increasing some columns' statistics targets and re-analyze
the table or just that column.  Example:

ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100;
ANALYZE tbl_file (fk_filetype_id);

I analyzed all tables and found no difference but after changing  
the statistics for fk_filetype_id I got way better results.
Still there is a big cost on table tbl_file_structure_id it is  
making a Bitmap heap scan.
Could I do something to improve this?

Thanks for all your help so far!

Limit  (cost=115168.22..115169.12 rows=20 width=173) (actual  
time=5138.401..5138.982 rows=20 loops=1)
  ->  GroupAggregate  (cost=115168.22..115341.60 rows=3853  
width=173) (actual time=5138.394..5138.929 rows=20 loops=1)
        ->  Sort  (cost=115168.22..115177.85 rows=3853 width=173)  
(actual time=5138.339..5138.430 rows=73 loops=1)
              Sort Key: tbl_file.file_name, tbl_file.file_ctime,  
tbl_structure.structure_path, tbl_computer.pk_computer_id,  
tbl_filetype.filetype_icon, tbl_computer.computer_name,  
tbl_share.share_name, tbl_share.share_path,  
tbl_file_structure.fk_file_id
              ->  Nested Loop  (cost=8.48..114938.74 rows=3853  
width=173) (actual time=185.588..5118.684 rows=730 loops=1)
                    ->  Seq Scan on tbl_filetype  (cost=0.00..1.25  
rows=1 width=18) (actual time=0.036..0.053 rows=1 loops=1)
                          Filter: (83 = pk_filetype_id)
                    ->  Hash Join  (cost=8.48..114898.96 rows=3853  
width=171) (actual time=185.538..5116.063 rows=730 loops=1)
                          Hash Cond: (tbl_archive.fk_share_id =  
tbl_share.pk_share_id)
                          ->  Nested Loop  (cost=6.27..114843.77  
rows=3853 width=146) (actual time=167.608..5093.255 rows=730 loops=1)
                                ->  Nested Loop   
(cost=6.27..32079.13 rows=1167 width=146) (actual  
time=130.594..3448.927 rows=330 loops=1)
                                      Join Filter:  
(tbl_archive.pk_archive_id = tbl_structure.fk_archive_id)
                                      ->  Nested Loop   
(cost=4.89..31185.00 rows=1167 width=138) (actual  
time=119.861..3402.527 rows=330 loops=1)
                                            ->  Nested Loop   
(cost=4.89..24413.49 rows=1167 width=56) (actual  
time=82.203..2097.142 rows=330 loops=1)
                                                  ->  Index Scan  
using tbl_file_idx6 on tbl_file  (cost=0.00..344.56 rows=114  
width=40) (actual time=13.164..15.475 rows=106 loops=1)
                                                        Index Cond:  
(fk_filetype_id = 83)
                                                  ->  Bitmap Heap  
Scan on tbl_file_structure  (cost=4.89..210.27 rows=69 width=24)  
(actual time=8.736..19.606 rows=3 loops=106)
                                                        Recheck  
Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
                                                        ->  Bitmap  
Index Scan on tbl_file_structure_idx  (cost=0.00..4.88 rows=69  
width=0) (actual time=2.574..2.574 rows=3 loops=106)
                                                              Index  
Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
                                            ->  Index Scan using  
tbl_structure_pkey on tbl_structure  (cost=0.00..5.79 rows=1  
width=98) (actual time=3.942..3.945 rows=1 loops=330)
                                                  Index Cond:  
(tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id)
                                      ->  Materialize   
(cost=1.37..1.71 rows=34 width=16) (actual time=0.034..0.077  
rows=34 loops=330)
                                            ->  Seq Scan on  
tbl_archive  (cost=0.00..1.34 rows=34 width=16) (actual  
time=10.606..10.693 rows=34 loops=1)
                                                  Filter:  
archive_complete
                                ->  Index Scan using tbl_acl_idx on  
tbl_acl  (cost=0.00..70.47 rows=36 width=8) (actual  
time=4.964..4.971 rows=2 loops=330)
                                      Index Cond:  
(tbl_acl.fk_file_structure_id =  
tbl_file_structure.pk_file_structure_id)
                          ->  Hash  (cost=2.16..2.16 rows=4  
width=41) (actual time=17.890..17.890 rows=4 loops=1)
                                ->  Hash Join  (cost=1.07..2.16  
rows=4 width=41) (actual time=17.848..17.872 rows=4 loops=1)
                                      Hash Cond:  
(tbl_share.fk_computer_id = tbl_computer.pk_computer_id)
                                      ->  Seq Scan on tbl_share   
(cost=0.00..1.04 rows=4 width=29) (actual time=0.093..0.099 rows=4  
loops=1)
                                      ->  Hash  (cost=1.03..1.03  
rows=3 width=20) (actual time=17.724..17.724 rows=3 loops=1)
                                            ->  Seq Scan on  
tbl_computer  (cost=0.00..1.03 rows=3 width=20) (actual  
time=17.697..17.706 rows=3 loops=1)
Total runtime: 5139.608 ms

The default statistics target is 10; the maximum value is 1000.

On some systems I've found that reducing random_page_cost from 4
(the default) to 2 gives more realistic cost estimates for index
scans.

--
Michael Fuhr

---------------------------(end of  
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings