Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Now I found time to investigate all proposed queries side by side. Here are the results (warmup + multiple executions). TL;DR - Jeff's proposed answer performs significantly faster with our data than any other solution (both planning and execution time).


I have no real idea how PostgreSQL does query rewriting but I guess the following steps (and reverse ones) are necessary:

1) detect "DISTINCT+LEFT OUTER JOIN" and rewrite to "SUBQUERY"

2) detect "MUTUAL JOIN ON KEY + OR" and rewrite to "UNION"

3) detect "MUTUAL IN KEY+ OR" and rewrite to "UNION"

4) detect "UNION + MUTUAL JOIN ON KEY" and rewrite to "SUBQUERY + UNION"


Doing (1+2) or (3+4) would result in the optimal query. To (1+2) seems easier to do, although a "common SELECT lift up"/"UNION push down" (if that's even the correct name) would also be great to have (that's 4)). Is this somehow correct?


Regarding cost estimation: it seems like PostgreSQL is clever enough here. So, I tend to agree with Jeff that this is not an issue with cost estimation.


---- DISTINCT + LEFT OUTER JOIN


explain analyze
SELECT distinct <columns of big_table>
FROM "big_table"
LEFT OUTER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id") LEFT OUTER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id")
WHERE
    ("table_a"."item_id" IN (<handful of items>)
    OR
    "table_b"."item_id" IN (<handful of items>));



HashAggregate (cost=206268.67..206269.46 rows=79 width=185) (actual time=904.859..904.860 rows=5 loops=1)
   Group Key: <columns of big_table>
-> Merge Left Join (cost=1.26..206265.11 rows=79 width=185) (actual time=904.835..904.846 rows=6 loops=1)
         Merge Cond: (big_table.id = table_a.big_table_id)
Filter: (((table_a.item_id)::text = ANY ('<handful of items>'::text[])) OR ((table_b.item_id)::text = ANY ('<handful of items>'::text[])))
         Rows Removed by Filter: 901355
-> Merge Left Join (cost=0.85..196703.22 rows=858293 width=243) (actual time=0.009..745.736 rows=858690 loops=1)
               Merge Cond: (big_table.id = table_b.big_table_id)
-> Index Scan using big_table_pkey on big_table (cost=0.42..180776.64 rows=858293 width=185) (actual time=0.005..399.102 rows=858690 loops=1) -> Index Scan using table_b_pkey on table_b (cost=0.42..10343.86 rows=274959 width=62) (actual time=0.003..60.961 rows=274958 loops=1) -> Index Scan using table_a_big_table_id on table_a (cost=0.42..4445.35 rows=118836 width=57) (actual time=0.003..25.456 rows=118833 loops=1)
 Planning time: 0.934 ms
 Execution time: 904.936 ms




---- SUBQUERY

explain analyze
SELECT <columns of big_table>
FROM "big_table"
WHERE
"big_table"."id" in (SELECT "table_a"."big_table_id" FROM "table_a" WHERE "table_a"."item_id" in (<handful of items>))
    OR
"big_table"."id" in (SELECT "table_b"."big_table_id" FROM "table_b" WHERE "table_b"."item_id" IN (<handful of items>));



Seq Scan on big_table (cost=100.41..115110.80 rows=643720 width=185) (actual time=229.819..229.825 rows=5 loops=1)
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
   Rows Removed by Filter: 858685
   SubPlan 1
-> Index Scan using table_a_item_id_211f18d89c25bc21_uniq on table_a (cost=0.42..58.22 rows=9 width=4) (actual time=0.026..0.043 rows=5 loops=1) Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
   SubPlan 2
-> Index Scan using table_b_item_id_611f9f519d835e89_uniq on table_b (cost=0.42..42.15 rows=5 width=4) (actual time=0.007..0.040 rows=5 loops=1) Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
 Planning time: 0.261 ms
 Execution time: 229.901 ms



---- UNION

explain analyze
SELECT <columns of big_table>
FROM "big_table"
    INNER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id")
WHERE
    "table_a"."item_id" IN (<handful of items>)
UNION
SELECT <columns of big_table>
FROM "big_table"
    INNER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id")
WHERE
    "table_b"."item_id" IN (<handful of items>);

HashAggregate (cost=216.84..216.98 rows=14 width=185) (actual time=0.092..0.093 rows=5 loops=1)
   Group Key: <columns of big_table>
-> Append (cost=22.59..216.21 rows=14 width=185) (actual time=0.035..0.080 rows=10 loops=1) -> Nested Loop (cost=22.59..132.17 rows=9 width=185) (actual time=0.034..0.044 rows=5 loops=1) -> Bitmap Heap Scan on table_a (cost=22.16..56.10 rows=9 width=4) (actual time=0.029..0.029 rows=5 loops=1) Recheck Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
                     Heap Blocks: exact=1
-> Bitmap Index Scan on table_a_item_id_211f18d89c25bc21_uniq (cost=0.00..22.16 rows=9 width=0) (actual time=0.027..0.027 rows=5 loops=1) Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[])) -> Index Scan using big_table_pkey on big_table (cost=0.42..8.44 rows=1 width=185) (actual time=0.002..0.002 rows=1 loops=5)
                     Index Cond: (id = table_a.big_table_id)
-> Nested Loop (cost=22.58..83.90 rows=5 width=185) (actual time=0.029..0.035 rows=5 loops=1) -> Bitmap Heap Scan on table_b (cost=22.15..41.64 rows=5 width=4) (actual time=0.026..0.026 rows=5 loops=1) Recheck Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
                     Heap Blocks: exact=1
-> Bitmap Index Scan on table_b_item_id_611f9f519d835e89_uniq (cost=0.00..22.15 rows=5 width=0) (actual time=0.025..0.025 rows=5 loops=1) Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[])) -> Index Scan using big_table_pkey on big_table big_table_1 (cost=0.42..8.44 rows=1 width=185) (actual time=0.001..0.001 rows=1 loops=5)
                     Index Cond: (id = table_b.big_table_id)
 Planning time: 0.594 ms
 Execution time: 0.177 ms




---- SUBQUERY + UNION


On 29.09.2016 20:03, Jeff Janes wrote:
SELECT * FROM big_table
WHERE
id in (SELECT big_table_id FROM table_a WHERE "table_a"."item_id" IN (<handful of items>) union SELECT big_table_id FROM table_a WHERE "table_b"."item_id" IN (<handful of items>)
  );


Nested Loop (cost=98.34..216.53 rows=14 width=185) (actual time=0.061..0.069 rows=5 loops=1) -> HashAggregate (cost=97.91..98.05 rows=14 width=4) (actual time=0.057..0.058 rows=5 loops=1)
         Group Key: table_a.big_table_id
-> Append (cost=22.16..97.88 rows=14 width=4) (actual time=0.028..0.054 rows=10 loops=1) -> Bitmap Heap Scan on table_a (cost=22.16..56.10 rows=9 width=4) (actual time=0.028..0.029 rows=5 loops=1) Recheck Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
                     Heap Blocks: exact=1
-> Bitmap Index Scan on table_a_item_id_211f18d89c25bc21_uniq (cost=0.00..22.16 rows=9 width=0) (actual time=0.026..0.026 rows=5 loops=1) Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[])) -> Bitmap Heap Scan on table_b (cost=22.15..41.64 rows=5 width=4) (actual time=0.024..0.024 rows=5 loops=1) Recheck Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
                     Heap Blocks: exact=1
-> Bitmap Index Scan on table_b_item_id_611f9f519d835e89_uniq (cost=0.00..22.15 rows=5 width=0) (actual time=0.023..0.023 rows=5 loops=1) Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[])) -> Index Scan using big_table_pkey on big_table (cost=0.42..8.44 rows=1 width=185) (actual time=0.001..0.001 rows=1 loops=5)
         Index Cond: (id = table_a.big_table_id)
 Planning time: 0.250 ms
 Execution time: 0.104 ms






Cheers,
Sven


NOTE: I added a file with the results for better readability.
---- DISTINCT + LEFT OUTER JOINs

explain analyze
SELECT distinct <columns of big_table>
FROM "big_table"
    LEFT OUTER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id") 
    LEFT OUTER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id") 
WHERE 
    ("table_a"."item_id" IN (<handful of items>)
    OR
    "table_b"."item_id" IN (<handful of items>));


 HashAggregate  (cost=206268.67..206269.46 rows=79 width=185) (actual time=904.859..904.860 rows=5 loops=1)
   Group Key: <columns of big_table>
   ->  Merge Left Join  (cost=1.26..206265.11 rows=79 width=185) (actual time=904.835..904.846 rows=6 loops=1)
         Merge Cond: (big_table.id = table_a.big_table_id)
         Filter: (((table_a.item_id)::text = ANY ('<handful of items>'::text[])) OR ((table_b.item_id)::text = ANY ('<handful of items>'::text[])))
         Rows Removed by Filter: 901355
         ->  Merge Left Join  (cost=0.85..196703.22 rows=858293 width=243) (actual time=0.009..745.736 rows=858690 loops=1)
               Merge Cond: (big_table.id = table_b.big_table_id)
               ->  Index Scan using big_table_pkey on big_table  (cost=0.42..180776.64 rows=858293 width=185) (actual time=0.005..399.102 rows=858690 loops=1)
               ->  Index Scan using table_b_pkey on table_b  (cost=0.42..10343.86 rows=274959 width=62) (actual time=0.003..60.961 rows=274958 loops=1)
         ->  Index Scan using table_a_big_table_id on table_a  (cost=0.42..4445.35 rows=118836 width=57) (actual time=0.003..25.456 rows=118833 loops=1)
 Planning time: 0.934 ms
 Execution time: 904.936 ms




---- SUBQUERY

explain analyze
SELECT <columns of big_table>
FROM "big_table"
WHERE
    "big_table"."id" in (SELECT "table_a"."big_table_id" FROM "table_a" WHERE "table_a"."item_id" in (<handful of items>))
    OR
    "big_table"."id" in (SELECT "table_b"."big_table_id" FROM "table_b" WHERE "table_b"."item_id" IN (<handful of items>));


 Seq Scan on big_table  (cost=100.41..115110.80 rows=643720 width=185) (actual time=229.819..229.825 rows=5 loops=1)
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
   Rows Removed by Filter: 858685
   SubPlan 1
     ->  Index Scan using table_a_item_id_211f18d89c25bc21_uniq on table_a  (cost=0.42..58.22 rows=9 width=4) (actual time=0.026..0.043 rows=5 loops=1)
           Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
   SubPlan 2
     ->  Index Scan using table_b_item_id_611f9f519d835e89_uniq on table_b  (cost=0.42..42.15 rows=5 width=4) (actual time=0.007..0.040 rows=5 loops=1)
           Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
 Planning time: 0.261 ms
 Execution time: 229.901 ms




---- UNION

explain analyze
SELECT <columns of big_table>
FROM "big_table"
    INNER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id") 
WHERE 
    "table_a"."item_id" IN (<handful of items>)
UNION
SELECT <columns of big_table>
FROM "big_table"
    INNER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id") 
WHERE 
    "table_b"."item_id" IN (<handful of items>);


 HashAggregate  (cost=216.84..216.98 rows=14 width=185) (actual time=0.092..0.093 rows=5 loops=1)
   Group Key: <columns of big_table>
   ->  Append  (cost=22.59..216.21 rows=14 width=185) (actual time=0.035..0.080 rows=10 loops=1)
         ->  Nested Loop  (cost=22.59..132.17 rows=9 width=185) (actual time=0.034..0.044 rows=5 loops=1)
               ->  Bitmap Heap Scan on table_a  (cost=22.16..56.10 rows=9 width=4) (actual time=0.029..0.029 rows=5 loops=1)
                     Recheck Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on table_a_item_id_211f18d89c25bc21_uniq  (cost=0.00..22.16 rows=9 width=0) (actual time=0.027..0.027 rows=5 loops=1)
                           Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
               ->  Index Scan using big_table_pkey on big_table  (cost=0.42..8.44 rows=1 width=185) (actual time=0.002..0.002 rows=1 loops=5)
                     Index Cond: (id = table_a.big_table_id)
         ->  Nested Loop  (cost=22.58..83.90 rows=5 width=185) (actual time=0.029..0.035 rows=5 loops=1)
               ->  Bitmap Heap Scan on table_b  (cost=22.15..41.64 rows=5 width=4) (actual time=0.026..0.026 rows=5 loops=1)
                     Recheck Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on table_b_item_id_611f9f519d835e89_uniq  (cost=0.00..22.15 rows=5 width=0) (actual time=0.025..0.025 rows=5 loops=1)
                           Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
               ->  Index Scan using big_table_pkey on big_table big_table_1  (cost=0.42..8.44 rows=1 width=185) (actual time=0.001..0.001 rows=1 loops=5)
                     Index Cond: (id = table_b.big_table_id)
 Planning time: 0.594 ms
 Execution time: 0.177 ms



---- SUBQUERY + UNION

explain analyze
SELECT <columns of big_table>
FROM "big_table"
WHERE
    "big_table"."id" in (SELECT "table_a"."big_table_id" FROM "table_a" WHERE "table_a"."item_id" in (<handful of items>)
                         UNION
                         SELECT "table_b"."big_table_id" FROM "table_b" WHERE "table_b"."item_id" IN (<handful of items>));


 Nested Loop  (cost=98.34..216.53 rows=14 width=185) (actual time=0.061..0.069 rows=5 loops=1)
   ->  HashAggregate  (cost=97.91..98.05 rows=14 width=4) (actual time=0.057..0.058 rows=5 loops=1)
         Group Key: table_a.big_table_id
         ->  Append  (cost=22.16..97.88 rows=14 width=4) (actual time=0.028..0.054 rows=10 loops=1)
               ->  Bitmap Heap Scan on table_a  (cost=22.16..56.10 rows=9 width=4) (actual time=0.028..0.029 rows=5 loops=1)
                     Recheck Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on table_a_item_id_211f18d89c25bc21_uniq  (cost=0.00..22.16 rows=9 width=0) (actual time=0.026..0.026 rows=5 loops=1)
                           Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
               ->  Bitmap Heap Scan on table_b  (cost=22.15..41.64 rows=5 width=4) (actual time=0.024..0.024 rows=5 loops=1)
                     Recheck Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on table_b_item_id_611f9f519d835e89_uniq  (cost=0.00..22.15 rows=5 width=0) (actual time=0.023..0.023 rows=5 loops=1)
                           Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
   ->  Index Scan using big_table_pkey on big_table  (cost=0.42..8.44 rows=1 width=185) (actual time=0.001..0.001 rows=1 loops=5)
         Index Cond: (id = table_a.big_table_id)
 Planning time: 0.250 ms
 Execution time: 0.104 ms

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux