Search Postgresql Archives

Merge Joins and Views

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I have a scenario with two tables, one with 5M rows and the other with about 3.7M (a subset taken from the first table). Each is clustered using its primary key (a single bigint column), and pg_stats shows that the id's correlation is 1 for both tables. In addition, I have a view over the 3.7M table that coalesces some columns that allow nulls.

So I'm running a simple query that does a left outer join from the 5M to the 3.7M, which basically combines the information between the two (and results in 5M rows, of course). It seems to me that the best plan should involve two index scans and a merge join. However, I get different plans depending on whether I use the view or the underlying table directly, and even the use of ORDER BY -- see examples below for details.

I don't know if this is a bug (I'm using version 8.3.0), but can anyone please explain why the optimizer (or rule system?) behaves this way?

Thank you,
--Chris


-----Example 1:-----
SELECT * FROM a LEFT OUTER JOIN b ON (a.id = b.id);

Merge Left Join (cost=43.99..353657.21 rows=5001671 width=106) (actual time=0.653..32529.319 rows=5000000 loops=1)
  Merge Cond: (a.id = b.id)
-> Index Scan using a_pkey on a (cost=0.00..173752.86 rows=5001671 width=81) (actual time=0.353..9754.375 rows=5000000 loops=1) -> Index Scan using b_pkey on b (cost=0.00..120980.85 rows=3713546 width=25) (actual time=0.279..8120.104 rows=3711523 loops=1) Total runtime: 33836.167 ms


-----Example 2:-----
-- v is a view that does SELECT ... FROM b;
SELECT * FROM a LEFT OUTER JOIN v ON (a.id = v.id);

Merge Left Join (cost=580217.86..822178.09 rows=5001671 width=100) (actual time=34260.004..67869.059 rows=5000000 loops=1)
  Merge Cond: (a.id = b.id)
-> Index Scan using a_pkey on a (cost=0.00..173752.86 rows=5001671 width=81) (actual time=0.270..10104.528 rows=5000000 loops=1) -> Materialize (cost=580217.86..626637.18 rows=3713546 width=19) (actual time=34259.696..43199.389 rows=3711523 loops=1) -> Sort (cost=580217.86..589501.72 rows=3713546 width=19) (actual time=34259.679..39448.310 rows=3711523 loops=1)
              Sort Key: b.id
              Sort Method:  external sort  Disk: 136632kB
-> Seq Scan on b (cost=0.00..61693.46 rows=3713546 width=25) (actual time=0.094..10224.402 rows=3711523 loops=1) Total runtime: 69202.529 ms


-----Example 3:-----
SELECT * FROM a LEFT OUTER JOIN (
  SELECT * FROM v ORDER BY id
) sub ON (a.id = sub.id);

Merge Right Join (cost=0.00..390792.67 rows=5001671 width=100) (actual time=0.497..38120.694 rows=5000000 loops=1)
  Merge Cond: (b.id = a.id)
-> Index Scan using b_pkey on b (cost=0.00..120980.85 rows=3713546 width=25) (actual time=0.262..13686.064 rows=3711523 loops=1) -> Index Scan using a_pkey on a (cost=0.00..173752.86 rows=5001671 width=81) (actual time=0.219..11233.746 rows=5000000 loops=1) Total runtime: 39467.843 ms

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux