Apparently useless bitmap scans

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There's another odd thing about this plan from yesterday.

Query:

        SELECT
                eh_subj.header_body AS subject,
                count(distinct eh_from.header_body)
        FROM
                email JOIN mime_part USING (email_id)
                JOIN email_header eh_subj USING (email_id, mime_part_id)
                JOIN email_header eh_from USING (email_id, mime_part_id)
        WHERE
                eh_subj.header_name = 'subject'
                AND eh_from.header_name = 'from'
                AND mime_part_id = 0
                AND (time >= timestamp '2007-05-05 17:01:59' AND time < timestamp '2007-05-05 17:01:59' + interval '60 min')
        GROUP BY
                eh_subj.header_body;

Plan:

                                                                                        QUERY PLAN                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=1920309.81..1920534.21 rows=11220 width=184) (actual time=5349.493..5587.536 rows=35000 loops=1)
   ->  Sort  (cost=1920309.81..1920337.86 rows=11220 width=184) (actual time=5349.427..5392.110 rows=35000 loops=1)
         Sort Key: eh_subj.header_body
         ->  Nested Loop  (cost=15576.58..1919555.05 rows=11220 width=184) (actual time=537.938..5094.377 rows=35000 loops=1)
               ->  Nested Loop  (cost=15576.58..475387.23 rows=11020 width=120) (actual time=537.858..4404.330 rows=35000 loops=1)
                     ->  Nested Loop  (cost=15576.58..430265.44 rows=11092 width=112) (actual time=537.768..4024.184 rows=35000 loops=1)
                           ->  Bitmap Heap Scan on email_header eh_from  (cost=15576.58..16041.55 rows=107156 width=104) (actual time=537.621..1801.032 rows=280990 loops=1)
                                 Recheck Cond: ((mime_part_id = 0) AND (header_name = 'from'::text))
                                 ->  BitmapAnd  (cost=15576.58..15576.58 rows=160 width=0) (actual time=500.006..500.006 rows=0 loops=1)
                                       ->  Bitmap Index Scan on dummy_index  (cost=0.00..3724.22 rows=107156 width=0) (actual time=85.025..85.025 rows=280990 loops=1)
                                       ->  Bitmap Index Scan on idx__email_header__from_local  (cost=0.00..5779.24 rows=107156 width=0) (actual time=173.006..173.006 rows=280990 loops=1)
                                       ->  Bitmap Index Scan on dummy2_index  (cost=0.00..5992.25 rows=107156 width=0) (actual time=174.463..174.463 rows=280990 loops=1)
                           ->  Index Scan using email_pkey on email  (cost=0.00..3.85 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=280990)
                                 Index Cond: (email.email_id = eh_from.email_id)
                                 Filter: (("time" >= '2007-05-05 17:01:59'::timestamp without time zone) AND ("time" < '2007-05-05 18:01:59'::timestamp without time zone))
                     ->  Index Scan using mime_part_pkey on mime_part  (cost=0.00..4.06 rows=1 width=12) (actual time=0.005..0.006 rows=1 loops=35000)
                           Index Cond: ((email.email_id = mime_part.email_id) AND (mime_part.mime_part_id = 0))
               ->  Index Scan using idx__email_header__email_id__mime_part_id on email_header eh_subj  (cost=0.00..130.89 rows=13 width=104) (actual time=0.009..0.015 rows=1 loops=35000)
                     Index Cond: ((email.email_id = eh_subj.email_id) AND (0 = eh_subj.mime_part_id))
                     Filter: (header_name = 'subject'::text)
 Total runtime: 5625.024 ms


I'm wondering what it wants to achieve with these three index scans:

    ->  Bitmap Index Scan on dummy_index  (cost=0.00..3724.22 rows=107156 width=0) (actual time=85.025..85.025 rows=280990 loops=1)
    ->  Bitmap Index Scan on idx__email_header__from_local  (cost=0.00..5779.24 rows=107156 width=0) (actual time=173.006..173.006 rows=280990 loops=1)
    ->  Bitmap Index Scan on dummy2_index  (cost=0.00..5992.25 rows=107156 width=0) (actual time=174.463..174.463 rows=280990 loops=1)

The indexes in question are:

CREATE INDEX dummy_index ON email_header ((555)) WHERE mime_part_id = 0 AND header_name = 'from';
CREATE INDEX dummy2_index ON email_header (substr(header_body,5)) WHERE mime_part_id = 0 AND header_name = 'from';
CREATE INDEX idx__email_header__from_local ON email_header (get_localpart(header_body)) WHERE mime_part_id = 0 AND header_name = 'from';

It appears to want to use these indexes to get the restriction

                AND eh_from.header_name = 'from'
                AND mime_part_id = 0

from the query, but why does it need three of them to do it, when all
of them have the same predicate and none of them has an indexed
expression that appears in the query?

There are more partial indexes with the same predicate, but it appears
to always use three.  (The two "dummy" indexes are just leftovers from
these experiments.)

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux