Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

Mario Splivalo <mario.splivalo@xxxxxxxxxx> · Mon, 30 Mar 2009 17:34:31 +0200

Tom Lane wrote:
Mario Splivalo <mario.splivalo@xxxxxxxxxx> writes:
    ->  Bitmap Heap Scan on photo_info_data u  (cost=39134.84..63740.08 
rows=109024 width=50) (actual time=270.464..270.469 rows=3 loops=2)
          Recheck Cond: ((u.field_name)::text = (t.key)::text)
          ->  Bitmap Index Scan on photo_info_data_pk 
(cost=0.00..39107.59 rows=109024 width=0) (actual time=270.435..270.435 
rows=3 loops=2)
                Index Cond: ((u.field_name)::text = (t.key)::text)

You need to figure out why that rowcount estimate is off by more than
four orders of magnitude :-(

Huh, thnx! :) Could you give me some starting points, what do I do?

Could it be because table is quite large, and there are only 3 columns 
that match join condition?

Now, after I finished writing above lines, index creation on 
photo_info_data(field_name) was done. When I rerun above query, here is 
what I get:

phototest=# explain analyze select field_name, count(*) from 
t_query_data t join photo_info_data u on t.key = u.field_name group by 
field_name;

 QUERY PLAN 

----------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=57414.33..57414.61 rows=22 width=9) (actual 
time=0.135..0.139 rows=2 loops=1)
   ->  Nested Loop  (cost=2193.50..56324.09 rows=218048 width=9) 
(actual time=0.063..0.114 rows=6 loops=1)
         ->  Seq Scan on t_query_data t  (cost=0.00..1.02 rows=2 
width=6) (actual time=0.019..0.022 rows=2 loops=1)
         ->  Bitmap Heap Scan on photo_info_data u 
(cost=2193.50..26798.74 rows=109024 width=9) (actual time=0.025..0.030 
rows=3 loops=2)
               Recheck Cond: ((u.field_name)::text = (t.key)::text)
               ->  Bitmap Index Scan on photo_info_data_ix__field_name 
 (cost=0.00..2166.24 rows=109024 width=0) (actual time=0.019..0.019 
rows=3 loops=2)
                     Index Cond: ((u.field_name)::text = (t.key)::text)
 Total runtime: 0.200 ms
(8 rows)

So, I guess I solved my problem! :) The explain analyze still shows that 
row estimate is 'quite off' (109024 estimated vs only 3 actuall), but 
the query is light-speeded :)

I tought that having primary key (and auto-index because of primary key) 
on (photo_id, field_name) should be enough. Now I have two indexes on 
field_name, but that seems to do good.

	Mike

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance