Subquery flattening causing sequential scan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



PostgreSQL 9.0.2
Mac OS X Server 10.6.8
Autovacuum is on, and I have a script that runs vacuum analyze verbose every night along with the backup.

I have a situation where I'm experiencing a seq scan on a table with almost 3M rows when my condition is based on a subquery.  A google search turned up a way to prevent flattening the subquery into a join using OFFSET 0.  This does work, reducing the query from around 1s to around 250ms, most of which is the subquery.   

My question is why does it do a seq scan when it flattens this subquery into a JOIN?  Is it because the emsg_messages table is around 1M rows?  Are there some guidelines to when the planner will prefer not to use an available index?  I just had a look through postgresql.conf and noticed that I forgot to set effective_cache_size to something reasonable for a machine with 16GB of memory.  Would the default setting of 128MB cause this behavior?  I can't bounce the production server midday to test that change.



EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
	SELECT id 
	FROM emsg_messages msg 
	WHERE msg.account_id = 314 AND msg.outgoing = FALSE 
	  AND msg.message_type = 1 AND msg.spam_level < 2 
	  AND msg.deleted_at IS NULL  
	  AND msg.id NOT IN (
			SELECT emf.message_id 
			FROM emsg_message_folders emf 
			where emf.account_id = 314
	)
)


QUERY PLAN	
Hash Semi Join  (cost=84522.74..147516.35 rows=49545 width=12) (actual time=677.058..1083.685 rows=2 loops=1)	
  Hash Cond: (ema.message_id = msg.id)	
  ->  Seq Scan on emsg_message_addresses ema  (cost=0.00..53654.78 rows=2873478 width=12) (actual time=0.020..424.241 rows=2875437 loops=1)	
  ->  Hash  (cost=84475.45..84475.45 rows=3783 width=4) (actual time=273.392..273.392 rows=1 loops=1)	
        Buckets: 1024  Batches: 1  Memory Usage: 1kB	
        ->  Bitmap Heap Scan on emsg_messages msg  (cost=7979.35..84475.45 rows=3783 width=4) (actual time=273.224..273.387 rows=1 loops=1)	
              Recheck Cond: (account_id = 314)	
              Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1))	
              ->  Bitmap Index Scan on index_emsg_messages_on_account_id  (cost=0.00..867.98 rows=34611 width=0) (actual time=9.633..9.633 rows=34997 loops=1)	
                    Index Cond: (account_id = 314)	
              SubPlan 1	
                ->  Bitmap Heap Scan on emsg_message_folders emf  (cost=704.90..7022.51 rows=35169 width=4) (actual time=5.684..38.016 rows=34594 loops=1)	
                      Recheck Cond: (account_id = 314)	
                      ->  Bitmap Index Scan on index_emsg_message_folders_on_account_id  (cost=0.00..696.10 rows=35169 width=0) (actual time=5.175..5.175 rows=34594 loops=1)	
                            Index Cond: (account_id = 314)	
Total runtime: 1083.890 ms	




EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
	SELECT id 
	FROM emsg_messages msg 
	WHERE msg.account_id = 314 AND msg.outgoing = FALSE 
	  AND msg.message_type = 1 AND msg.spam_level < 2 
	  AND msg.deleted_at IS NULL  
	  AND msg.id NOT IN (
			SELECT emf.message_id 
			FROM emsg_message_folders emf 
			where emf.account_id = 314
	)
	OFFSET 0
)


QUERY PLAN	
Nested Loop  (cost=84524.89..87496.74 rows=2619 width=12) (actual time=273.409..273.412 rows=2 loops=1)	
  ->  HashAggregate  (cost=84524.89..84526.89 rows=200 width=4) (actual time=273.345..273.346 rows=1 loops=1)	
        ->  Limit  (cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.171..273.335 rows=1 loops=1)	
              ->  Bitmap Heap Scan on emsg_messages msg  (cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.169..273.333 rows=1 loops=1)	
                    Recheck Cond: (account_id = 314)	
                    Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1))	
                    ->  Bitmap Index Scan on index_emsg_messages_on_account_id  (cost=0.00..867.99 rows=34612 width=0) (actual time=9.693..9.693 rows=34998 loops=1)	
                          Index Cond: (account_id = 314)	
                    SubPlan 1	
                      ->  Bitmap Heap Scan on emsg_message_folders emf  (cost=704.90..7022.51 rows=35169 width=4) (actual time=5.795..39.420 rows=34594 loops=1)	
                            Recheck Cond: (account_id = 314)	
                            ->  Bitmap Index Scan on index_emsg_message_folders_on_account_id  (cost=0.00..696.10 rows=35169 width=0) (actual time=5.266..5.266 rows=34594 loops=1)	
                                  Index Cond: (account_id = 314)	
  ->  Index Scan using index_emsg_message_addresses_on_message_id on emsg_message_addresses ema  (cost=0.00..14.69 rows=13 width=12) (actual time=0.056..0.058 rows=2 loops=1)	
        Index Cond: (ema.message_id = msg.id)	
Total runtime: 273.679 ms	


Jim Crate


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux