Title: Nested Loops vs. Hash Joins or Merge Joins
I am attempting to learn more about the way Pg decides what operators to use in its query planning and executions. I have moderately complicated table layout, but it is mostly normalized I recently created a query:
select account.acct_name as "Customer Name", NULL as "Facility", account.acct_number as "LDC Acct#", account.svc_address as "Service Address",
account.svc_address as "Service Address", account.svc_city as "Service City", account.svc_state as "Service State", account.svc_city as "Service City",
account.svc_zip as "Service Zip", product.ldc_name as "LDC", NULL as "ESI Rate", NULL as "LDC Rate",
account.billing_address as "Mailing Address1", account.billing_address_2 as "Mailing Address2",
account.billing_city || ', ' || account.billing_state as "City, State", account.billing_zip as "Zip", customer.first_name || ' ' || customer.last_name
as "Contact", customer.phone as "Phone", customer.class as "Customer Class", NULL as "Tax Exempt", NULL as "Exempt%",
marketer_divisions.channel_partner_code as "Channel Partner", NULL as "AE", NULL as "Annual Use MCF", account.rate as "Trigger Price",
marketer_divisions.channel_partner_fee as "Channel Partner Fee"
from naes.reconciliation
inner join naes.application
inner join naes.account
inner join naes.marketer_product
inner join naes.marketer_divisions
inner join naes.cities
on marketer_divisions.city_id = cities.city_id
on marketer_product.division_id = marketer_divisions.division_id
inner join naes.product
on marketer_product.ldc_id = product.ldc_id
on account.marketer_product_id = marketer_product.marketer_product_id
inner join naes.customer
on account.customer_id = customer.customer_id
on account.app_id = application.app_id and account.acct_id = application.acct_id
on reconciliation.app_id = application.app_id and reconciliation.transferred_date is NULL;
The query runs fine I have no performance issues with it, but here are two query plans for the above query, one with nested loops on, the other with them off:
Nested Loops on:
Nested Loop (cost=3.33..11.37 rows=1 width=268) (actual time=2.166..2.982 rows=3 loops=1)
Join Filter: ("outer".city_id = "inner".city_id)
-> Nested Loop (cost=3.33..10.32 rows=1 width=272) (actual time=2.136..2.863 rows=3 loops=1)
Join Filter: ("outer".division_id = "inner".division_id)plication.app_id and reco=
-> Nested Loop (cost=3.33..9.27 rows=1 width=231) (actual time=2.119..2.763 rows=3 loops=1)
Join Filter: ("outer".ldc_id = "inner".ldc_id)
-> Nested Loop (cost=3.33..8.23 rows=1 width=218) (actual time=2.101..2.659 rows=3 loops=1)
-> Nested Loop (cost=3.33..5.15 rows=1 width=151) (actual time=2.068..2.559 rows=3 loops=1)
Join Filter: ("inner".app_id = "outer".app_id)
-> Merge Join (cost=3.33..4.11 rows=1 width=159) (actual time=1.096..1.477 rows=31 loops=1)
Merge Cond: ("outer".marketer_product_id = "inner".marketer_product_id)
-> Index Scan using "PK_marketer_product_id" on marketer_product (cost=0.00..3.04 rows=4 width=12) (actual time=0.017..0.033 rows=4 loops=1)
-> Sort (cost=3.33..3.33 rows=1 width=155) (actual time=1.065..1.180 rows=31 loops=1)
Sort Key: account.marketer_product_id
-> Hash Join (cost=1.75..3.32 rows=1 width=155) (actual time=0.457..0.848 rows=31 loops=1)
Hash Cond: (("outer".app_id = "inner".app_id) AND ("outer".acct_id = "inner".acct_id))
-> Seq Scan on account (cost=0.00..1.28 rows=28 width=155) (actual time=0.007..0.160 rows=34 loops=1)
-> Hash (cost=1.50..1.50 rows=50 width=8) (actual time=0.413..0.413 rows=50 loops=1)
-> Seq Scan on application (cost=0.00..1.50 rows=50 width=8) (actual time=0.006..0.209 rows=50 loops=1)
-> Seq Scan on reconciliation (cost=0.00..1.03 rows=1 width=4) (actual time=0.005..0.016 rows=3 loops=31)
Filter: (transferred_date IS NULL)
-> Index Scan using customer_pkey on customer (cost=0.00..3.06 rows=1 width=75) (actual time=0.011..0.015 rows=1 loops=3)
Index Cond: ("outer".customer_id = customer.customer_id)
-> Seq Scan on product (cost=0.00..1.02 rows=2 width=21) (actual time=0.005..0.013 rows=2 loops=3)
-> Seq Scan on marketer_divisions (cost=0.00..1.02 rows=2 width=49) (actual time=0.005..0.013 rows=2 loops=3)
-> Seq Scan on cities (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.013 rows=2 loops=3)
Total runtime: 3.288 ms
Nested Loops off:
Hash Join (cost=8.27..11.78 rows=1 width=268) (actual time=1.701..1.765 rows=3 loops=1)
Hash Cond: ("outer".city_id = "inner".city_id)
-> Hash Join (cost=7.24..10.73 rows=1 width=272) (actual time=1.629..1.667 rows=3 loops=1)
Hash Cond: ("outer".customer_id = "inner".customer_id)
-> Seq Scan on customer (cost=0.00..3.32 rows=32 width=75) (actual time=0.006..0.136 rows=33 loops=1)
-> Hash (cost=7.24..7.24 rows=1 width=205) (actual time=1.366..1.366 rows=3 loops=1)
-> Hash Join (cost=6.43..7.24 rows=1 width=205) (actual time=1.243..1.333 rows=3 loops=1)
Hash Cond: ("outer".division_id = "inner".division_id)
-> Hash Join (cost=5.40..6.20 rows=1 width=164) (actual time=1.184..1.252 rows=3 loops=1)
Hash Cond: ("outer".ldc_id = "inner".ldc_id)
-> Merge Join (cost=4.38..5.16 rows=1 width=151) (actual time=1.124..1.169 rows=3 loops=1)
Merge Cond: ("outer".marketer_product_id = "inner".marketer_product_id)
-> Index Scan using "PK_marketer_product_id" on marketer_product (cost=0.00..3.04 rows=4 width=12) (actual time=0.012..0.019 rows=2 loops=1)
-> Sort (cost=4.38..4.38 rows=1 width=147) (actual time=1.098..1.109 rows=3 loops=1)
Sort Key: account.marketer_product_id
-> Hash Join (cost=2.78..4.37 rows=1 width=147) (actual time=1.007..1.064 rows=3 loops=1)
Hash Cond: ("outer".app_id = "inner".app_id)
-> Hash Join (cost=1.75..3.32 rows=1 width=155) (actual time=0.494..0.875 rows=31 loops=1)
Hash Cond: (("outer".app_id = "inner".app_id) AND ("outer".acct_id = "inner".acct_id))
-> Seq Scan on account (cost=0.00..1.28 rows=28 width=155) (actual time=0.007..0.154 rows=34 loops=1)
-> Hash (cost=1.50..1.50 rows=50 width=8) (actual time=0.451..0.451 rows=50 loops=1)
-> Seq Scan on application (cost=0.00..1.50 rows=50 width=8) (actual time=0.006..0.223 rows=50 loops=1)
-> Hash (cost=1.03..1.03 rows=1 width=4) (actual time=0.042..0.042 rows=3 loops=1)
-> Seq Scan on reconciliation (cost=0.00..1.03 rows=1 width=4) (actual time=0.007..0.019 rows=3 loops=1)
Filter: (transferred_date IS NULL)
-> Hash (cost=1.02..1.02 rows=2 width=21) (actual time=0.036..0.036 rows=2 loops=1)
-> Seq Scan on product (cost=0.00..1.02 rows=2 width=21) (actual time=0.005..0.014 rows=2 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=49) (actual time=0.036..0.036 rows=2 loops=1)
-> Seq Scan on marketer_divisions (cost=0.00..1.02 rows=2 width=49) (actual time=0.007..0.016 rows=2 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=4) (actual time=0.039..0.039 rows=2 loops=1)
-> Seq Scan on cities (cost=0.00..1.02 rows=2 width=4) (actual time=0.009..0.017 rows=2 loops=1)
Total runtime: 2.084 ms
With nested loops enabled does it choose to use them because it sees the estimated start up cost with loops as less? Does it not know that the total query would be faster with the Hash Joins? This query is in development right now, and as such there are not many rows. When it goes to production the reconciliation table will grow by about 50 – 100 rows per day where the transferred_date is NULL (this is the driving criteria behind this query.) As the table grows can I expect Pg to realize the the nested loops will be slower and will it switch to the Hash Joins? If not how would I force it to use the Hash Joins without just turning off nested loops completely? Is it a good idea to turn off nested loops completely?
Statistics collecting and auto vacuum is enabled btw. I have an erd diagram showing the table structures if anyone is interested in looking at it, just let me know.
Thanks,
Ketema