Query Plan - Bitmap Index Scan and Views

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Is there any inherent benefit of using a the IN operator versus joining a temporary table? Should they offer near equal performance? It appears bitmap scan's aren't done when matching across a small temporary table.

I have a temporary table with 5 integers in it that I'm matching against mildly complex view that has 5 joins. I've analyzed the database after the temporary table was created.

Matching against the temporary table takes: 36492.836 ms.
Matching using the IN operator with the same content takes: 2.732 ms.

These measurements are after the query has been run a few times, so the data should be in cache.

It would appear that the temporary table's join isn't evaluated deep enough in the query plan to prevent the more expensive joins from running, is there a way for force it? Could some setting be wrong that telling the planner to make this decision? The same thing happens when I perform the join without the view.

select * from foo;
  oid
--------
161007
161008
161000
161009
161002
(5 rows)


Plan for IN match:

=# explain analyze select * from crawled_url_full_view where crawled_url_full_view.oid in (161007, 161008, 161000, 161009, 161002); QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------- Hash Left Join (cost=14.50..94.11 rows=5 width=538) (actual time=1.025..1.522 rows=5 loops=1)
   Hash Cond: ("outer".classification_set_id = "inner".id)
   Join Filter: ("outer".classification_set_id IS NOT NULL)
-> Hash Left Join (cost=13.30..92.86 rows=5 width=526) (actual time=0.794..1.251 rows=5 loops=1)
         Hash Cond: ("outer".charset_id = "inner".id)
         Join Filter: ("outer".charset_id IS NOT NULL)
-> Hash Left Join (cost=12.21..91.70 rows=5 width=515) (actual time=0.631..1.048 rows=5 loops=1) Hash Cond: ("outer".http_error_description_id = "inner".id) Join Filter: ("outer".http_error_description_id IS NOT NULL) -> Hash Left Join (cost=11.13..90.59 rows=5 width=472) (actual time=0.488..0.868 rows=5 loops=1)
                     Hash Cond: ("outer".content_type_id = "inner".id)
                     Join Filter: ("outer".content_type_id IS NOT NULL)
-> Nested Loop Left Join (cost=10.02..89.41 rows=5 width=443) (actual time=0.244..0.578 rows=5 loops=1) Join Filter: ("outer".redirect_url_id IS NOT NULL) -> Nested Loop Left Join (cost=10.02..59.56 rows=5 width=339) (actual time=0.225..0.488 rows=5 loops=1) -> Bitmap Heap Scan on crawled_url (cost=10.02..29.71 rows=5 width=235) (actual time=0.170..0.217 rows=5 loops=1) Recheck Cond: ((oid = 161007) OR (oid = 161008) OR (oid = 161000) OR (oid = 161009) OR (oid = 161002)) -> BitmapOr (cost=10.02..10.02 rows=5 width=0) (actual time=0.137..0.137 rows=0 loops=1) -> Bitmap Index Scan on crawled_url_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.061..0.061 rows=1 loops=1) Index Cond: (oid = 161007) -> Bitmap Index Scan on crawled_url_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.013..0.013 rows=1 loops=1) Index Cond: (oid = 161008) -> Bitmap Index Scan on crawled_url_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.013..0.013 rows=1 loops=1) Index Cond: (oid = 161000) -> Bitmap Index Scan on crawled_url_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.014..0.014 rows=1 loops=1) Index Cond: (oid = 161009) -> Bitmap Index Scan on crawled_url_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.012..0.012 rows=1 loops=1) Index Cond: (oid = 161002) -> Index Scan using url_pkey on url (cost=0.00..5.96 rows=1 width=108) (actual time=0.031..0.036 rows=1 loops=5) Index Cond: (url.url_id = "outer".url_id) -> Index Scan using url_pkey on url r1 (cost=0.00..5.96 rows=1 width=108) (actual time=0.004..0.004 rows=0 loops=5) Index Cond: (r1.url_id = "outer".redirect_url_id) -> Hash (cost=1.09..1.09 rows=9 width=33) (actual time=0.130..0.130 rows=9 loops=1) -> Seq Scan on content_types (cost=0.00..1.09 rows=9 width=33) (actual time=0.017..0.062 rows=9 loops=1) -> Hash (cost=1.06..1.06 rows=6 width=47) (actual time=0.088..0.088 rows=6 loops=1) -> Seq Scan on http_error_descriptions (cost=0.00..1.06 rows=6 width=47) (actual time=0.010..0.040 rows=6 loops=1) -> Hash (cost=1.08..1.08 rows=8 width=15) (actual time=0.103..0.103 rows=8 loops=1) -> Seq Scan on charsets (cost=0.00..1.08 rows=8 width=15) (actual time=0.011..0.048 rows=8 loops=1) -> Hash (cost=1.16..1.16 rows=16 width=16) (actual time=0.175..0.175 rows=16 loops=1) -> Seq Scan on classification_sets (cost=0.00..1.16 rows=16 width=16) (actual time=0.012..0.088 rows=16 loops=1)
Total runtime: 2.743 ms
(41 rows)



Plan for temp table match:


=# explain analyze select * from foo, crawled_url_full_view where crawled_url_full_view.oid = foo.oid; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------ Hash IN Join (cost=35667.15..145600.71 rows=5 width=538) (actual time=22371.445..36482.823 rows=5 loops=1)
   Hash Cond: ("outer".oid = "inner".oid)
-> Hash Left Join (cost=35666.09..143698.61 rows=380198 width=538) (actual time=9901.782..35218.758 rows=360531 loops=1)
         Hash Cond: ("outer".classification_set_id = "inner".id)
         Join Filter: ("outer".classification_set_id IS NOT NULL)
-> Hash Left Join (cost=35664.89..140493.61 rows=380198 width=526) (actual time=9901.456..32363.212 rows=360531 loops=1)
               Hash Cond: ("outer".charset_id = "inner".id)
               Join Filter: ("outer".charset_id IS NOT NULL)
-> Hash Left Join (cost=35663.79..135684.27 rows=380198 width=515) (actual time=9901.257..29400.189 rows=360531 loops=1) Hash Cond: ("outer".http_error_description_id = "inner".id) Join Filter: ("outer".http_error_description_id IS NOT NULL) -> Hash Left Join (cost=35662.71..133782.19 rows=380198 width=472) (actual time=9901.080..26691.473 rows=360531 loops=1) Hash Cond: ("outer".content_type_id = "inner".id) Join Filter: ("outer".content_type_id IS NOT NULL) -> Hash Left Join (cost=35661.60..128972.84 rows=380198 width=443) (actual time=9900.802..23743.323 rows=360531 loops=1) Hash Cond: ("outer".redirect_url_id = "inner".url_id) Join Filter: ("outer".redirect_url_id IS NOT NULL) -> Hash Left Join (cost=17830.80..66680.80 rows=380198 width=339) (actual time=4592.701..14466.994 rows=360531 loops=1) Hash Cond: ("outer".url_id = "inner".url_id) -> Seq Scan on crawled_url (cost=0.00..10509.98 rows=380198 width=235) (actual time=0.026..2976.911 rows=360531 loops=1) -> Hash (cost=10627.04..10627.04 rows=377104 width=108) (actual time=4591.703..4591.703 rows=382149 loops=1) -> Seq Scan on url (cost=0.00..10627.04 rows=377104 width=108) (actual time=0.041..2142.702 rows=382149 loops=1) -> Hash (cost=10627.04..10627.04 rows=377104 width=108) (actual time=5307.540..5307.540 rows=382149 loops=1) -> Seq Scan on url r1 (cost=0.00..10627.04 rows=377104 width=108) (actual time=0.138..2503.577 rows=382149 loops=1) -> Hash (cost=1.09..1.09 rows=9 width=33) (actual time=0.144..0.144 rows=9 loops=1) -> Seq Scan on content_types (cost=0.00..1.09 rows=9 width=33) (actual time=0.020..0.068 rows=9 loops=1) -> Hash (cost=1.06..1.06 rows=6 width=47) (actual time=0.108..0.108 rows=6 loops=1) -> Seq Scan on http_error_descriptions (cost=0.00..1.06 rows=6 width=47) (actual time=0.015..0.049 rows=6 loops=1) -> Hash (cost=1.08..1.08 rows=8 width=15) (actual time=0.129..0.129 rows=8 loops=1) -> Seq Scan on charsets (cost=0.00..1.08 rows=8 width=15) (actual time=0.014..0.058 rows=8 loops=1) -> Hash (cost=1.16..1.16 rows=16 width=16) (actual time=0.234..0.234 rows=16 loops=1) -> Seq Scan on classification_sets (cost=0.00..1.16 rows=16 width=16) (actual time=0.014..0.107 rows=16 loops=1) -> Hash (cost=1.05..1.05 rows=5 width=4) (actual time=0.092..0.092 rows=5 loops=1) -> Seq Scan on foo (cost=0.00..1.05 rows=5 width=4) (actual time=0.022..0.044 rows=5 loops=1)
Total runtime: 36492.836 ms
(35 rows)


Definition of the view:

create view crawled_url_full_view as
select crawled_url.*,
url.url,
r1.url as redirect_url,
content_types.type as content_type,
http_error_descriptions.error as http_error_description,
charsets.name as charset,
classification_sets.name as classification_set
from crawled_url left join url on url.url_id = crawled_url.url_id
left join url as r1 on (r1.url_id = crawled_url.redirect_url_id and crawled_url.redirect_url_id is not null) left join content_types on (content_types.id = crawled_url.content_type_id and crawled_url.content_type_id is not null) left join http_error_descriptions on (http_error_descriptions.id = crawled_url.http_error_description_id and crawled_url.http_error_description_id is not null) left join charsets on (charsets.id = crawled_url.charset_id and crawled_url.charset_id is not null) left join classification_sets on (classification_sets.id = crawled_url.classification_set_id and crawled_url.classification_set_id is not null);


Version is: PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.1 20050727 (Red Hat 4.0.1-5)

work_mem=30000
shared_buffers=5000
effective_cache_size=15000

Thanks for any help,

Rusty
--
Rusty Conover
InfoGears Inc.
Web: http://www.infogears.com





[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux