I have this schema:
CREATE TABLE onp_crm_person(
id serial PRIMARY KEY,
onp_user_id bigint references onp_user(id) deferrable initially deferred,
is_resource boolean not null default false,
UNIQUE(onp_user_id)
);
CREATE TABLE onp_crm_activity_log(
id bigserial PRIMARY KEY,
relation_id integer REFERENCES onp_crm_relation(entity_id),
logged_for int references onp_crm_person(onp_user_id),
duration bigint
);
CREATE TABLE onp_crm_invoice(
entity_id bigint PRIMARY KEY REFERENCES onp_crm_entity(entity_id),
status_key VARCHAR NOT NULL,
credit_against bigint REFERENCES onp_crm_invoice(entity_id),
sent_date DATE,
UNIQUE(credit_against) deferrable INITIALLY DEFERRED
-- invoice_print_template_id is added after creation of origo_invoice_print_template
);
CREATE TABLE onp_crm_invoice_line (
id SERIAL PRIMARY KEY,
invoice_id INTEGER NOT NULL REFERENCES onp_crm_invoice (entity_id)
);
CREATE TABLE onp_crm_calendarentry_invoice_membership(
invoice_line_id INTEGER NOT NULL REFERENCES onp_crm_invoice_line(id) ON DELETE CASCADE,
calendar_entry_id INTEGER NOT NULL REFERENCES onp_crm_activity_log(id),
unique(invoice_line_id, calendar_entry_id) DEFERRABLE INITIALLY DEFERRED
);
This query performs terribly slow ( ~26 minutes, 1561346.597ms):
explain analyze
SELECT
log.relation_id as company_id
, sum(log.duration) AS duration
FROM onp_crm_activity_log log
JOIN onp_crm_person logfor ON logfor._onp_user_id_ = log.logged_for AND logfor.is_resource = FALSE
WHERE 1 = 1
-- Filter out already invoiced before 2016-06-27
AND NOT EXISTS(
SELECT * FROM onp_crm_calendarentry_invoice_membership cem
JOIN onp_crm_invoice_line il ON cem.invoice_line_id = il.id
JOIN onp_crm_invoice inv ON il.invoice_id = inv.entity_id
WHERE cem.calendar_entry_id = log.id
AND inv.status_key = 'INVOICE_STATUS_INVOICED'
AND inv.sent_date <= '2016-06-27'
AND NOT EXISTS(
SELECT * FROM onp_crm_invoice creditnote
WHERE il.invoice_id = creditnote.credit_against
AND creditnote.status_key = 'INVOICE_STATUS_INVOICED'
AND creditnote.sent_date <= '2016-06-27'
)
)
GROUP BY
log.relation_id
;
Explain output:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=13778.63..13796.39 rows=1421 width=12) (actual time=1561343.861..1561344.042 rows=724 loops=1)
Group Key: log.relation_id
-> Nested Loop Anti Join (cost=741.35..13768.63 rows=2000 width=12) (actual time=471.973..1561221.929 rows=96095 loops=1)
Join Filter: (cem.calendar_entry_id = log.id)
Rows Removed by Join Filter: 11895758618
-> Hash Join (cost=86.56..9729.03 rows=2000 width=20) (actual time=0.170..668.911 rows=181644 loops=1)
Hash Cond: (log.logged_for = logfor.onp_user_id)
-> Seq Scan on onp_crm_activity_log log (cost=0.00..8930.98 rows=184398 width=24) (actual time=0.007..538.893 rows=182378 loops=1)
-> Hash (cost=39.46..39.46 rows=3768 width=8) (actual time=0.126..0.126 rows=36 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 34kB
-> Bitmap Heap Scan on onp_crm_person logfor (cost=3.69..39.46 rows=3768 width=8) (actual time=0.040..0.106 rows=36 loops=1)
Recheck Cond: (onp_user_id IS NOT NULL)
Filter: (NOT is_resource)
Rows Removed by Filter: 5
Heap Blocks: exact=10
-> Bitmap Index Scan on onp_crm_person_onp_id_idx (cost=0.00..2.75 rows=41 width=0) (actual time=0.019..0.019 rows=41 loops=1)
-> Materialize (cost=654.79..4009.60 rows=1 width=4) (actual time=0.000..2.829 rows=65490 loops=181644)
-> Nested Loop (cost=654.79..4009.59 rows=1 width=4) (actual time=9.056..386.835 rows=85668 loops=1)
-> Nested Loop (cost=654.50..4009.27 rows=1 width=8) (actual time=9.046..165.280 rows=88151 loops=1)
-> Hash Anti Join (cost=654.21..4008.72 rows=1 width=8) (actual time=9.016..40.672 rows=76174 loops=1)
Hash Cond: (il.invoice_id = creditnote.credit_against)
-> Seq Scan on onp_crm_invoice_line il (cost=0.00..3062.01 rows=78001 width=8) (actual time=0.005..11.259 rows=78614 loops=1)
-> Hash (cost=510.56..510.56 rows=11492 width=8) (actual time=8.940..8.940 rows=372 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 143kB
-> Seq Scan on onp_crm_invoice creditnote (cost=0.00..510.56 rows=11492 width=8) (actual time=0.014..7.882 rows=11507 loops=1)
Filter: ((sent_date <= '2016-06-27'::date) AND ((status_key)::text = 'INVOICE_STATUS_INVOICED'::text))
Rows Removed by Filter: 149
-> Index Only Scan using onp_crm_calendarentry_invoice_invoice_line_id_calendar_entr_key on onp_crm_calendarentry_invoice_membership cem (cost=0.29..0.45 rows=9 width=8) (actual time=0.001..0.001 rows=1 loops=76174)
Index Cond: (invoice_line_id = il.id)
Heap Fetches: 4371
-> Index Scan using onp_crm_invoice_pkey on onp_crm_invoice inv (cost=0.29..0.31 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=88151)
Index Cond: (entity_id = il.invoice_id)
Filter: ((sent_date <= '2016-06-27'::date) AND ((status_key)::text = 'INVOICE_STATUS_INVOICED'::text))
Rows Removed by Filter: 0
Planning time: 3.307 ms
Execution time: 1561346.597 ms
(36 rows)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=13778.63..13796.39 rows=1421 width=12) (actual time=1561343.861..1561344.042 rows=724 loops=1)
Group Key: log.relation_id
-> Nested Loop Anti Join (cost=741.35..13768.63 rows=2000 width=12) (actual time=471.973..1561221.929 rows=96095 loops=1)
Join Filter: (cem.calendar_entry_id = log.id)
Rows Removed by Join Filter: 11895758618
-> Hash Join (cost=86.56..9729.03 rows=2000 width=20) (actual time=0.170..668.911 rows=181644 loops=1)
Hash Cond: (log.logged_for = logfor.onp_user_id)
-> Seq Scan on onp_crm_activity_log log (cost=0.00..8930.98 rows=184398 width=24) (actual time=0.007..538.893 rows=182378 loops=1)
-> Hash (cost=39.46..39.46 rows=3768 width=8) (actual time=0.126..0.126 rows=36 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 34kB
-> Bitmap Heap Scan on onp_crm_person logfor (cost=3.69..39.46 rows=3768 width=8) (actual time=0.040..0.106 rows=36 loops=1)
Recheck Cond: (onp_user_id IS NOT NULL)
Filter: (NOT is_resource)
Rows Removed by Filter: 5
Heap Blocks: exact=10
-> Bitmap Index Scan on onp_crm_person_onp_id_idx (cost=0.00..2.75 rows=41 width=0) (actual time=0.019..0.019 rows=41 loops=1)
-> Materialize (cost=654.79..4009.60 rows=1 width=4) (actual time=0.000..2.829 rows=65490 loops=181644)
-> Nested Loop (cost=654.79..4009.59 rows=1 width=4) (actual time=9.056..386.835 rows=85668 loops=1)
-> Nested Loop (cost=654.50..4009.27 rows=1 width=8) (actual time=9.046..165.280 rows=88151 loops=1)
-> Hash Anti Join (cost=654.21..4008.72 rows=1 width=8) (actual time=9.016..40.672 rows=76174 loops=1)
Hash Cond: (il.invoice_id = creditnote.credit_against)
-> Seq Scan on onp_crm_invoice_line il (cost=0.00..3062.01 rows=78001 width=8) (actual time=0.005..11.259 rows=78614 loops=1)
-> Hash (cost=510.56..510.56 rows=11492 width=8) (actual time=8.940..8.940 rows=372 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 143kB
-> Seq Scan on onp_crm_invoice creditnote (cost=0.00..510.56 rows=11492 width=8) (actual time=0.014..7.882 rows=11507 loops=1)
Filter: ((sent_date <= '2016-06-27'::date) AND ((status_key)::text = 'INVOICE_STATUS_INVOICED'::text))
Rows Removed by Filter: 149
-> Index Only Scan using onp_crm_calendarentry_invoice_invoice_line_id_calendar_entr_key on onp_crm_calendarentry_invoice_membership cem (cost=0.29..0.45 rows=9 width=8) (actual time=0.001..0.001 rows=1 loops=76174)
Index Cond: (invoice_line_id = il.id)
Heap Fetches: 4371
-> Index Scan using onp_crm_invoice_pkey on onp_crm_invoice inv (cost=0.29..0.31 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=88151)
Index Cond: (entity_id = il.invoice_id)
Filter: ((sent_date <= '2016-06-27'::date) AND ((status_key)::text = 'INVOICE_STATUS_INVOICED'::text))
Rows Removed by Filter: 0
Planning time: 3.307 ms
Execution time: 1561346.597 ms
(36 rows)
Here is with set enable_nestloop to false;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=15936.07..15953.83 rows=1421 width=12) (actual time=337.878..338.045 rows=724 loops=1)
Group Key: log.relation_id
-> Hash Anti Join (cost=6258.35..15926.07 rows=2000 width=12) (actual time=139.976..317.402 rows=96097 loops=1)
Hash Cond: (log.id = cem.calendar_entry_id)
-> Hash Join (cost=86.56..9729.03 rows=2000 width=20) (actual time=0.164..132.935 rows=181646 loops=1)
Hash Cond: (log.logged_for = logfor.onp_user_id)
-> Seq Scan on onp_crm_activity_log log (cost=0.00..8930.98 rows=184398 width=24) (actual time=0.006..89.170 rows=182380 loops=1)
-> Hash (cost=39.46..39.46 rows=3768 width=8) (actual time=0.118..0.118 rows=36 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 34kB
-> Bitmap Heap Scan on onp_crm_person logfor (cost=3.69..39.46 rows=3768 width=8) (actual time=0.037..0.101 rows=36 loops=1)
Recheck Cond: (onp_user_id IS NOT NULL)
Filter: (NOT is_resource)
Rows Removed by Filter: 5
Heap Blocks: exact=10
-> Bitmap Index Scan on onp_crm_person_onp_id_idx (cost=0.00..2.75 rows=41 width=0) (actual time=0.017..0.017 rows=41 loops=1)
-> Hash (cost=6171.78..6171.78 rows=1 width=4) (actual time=139.779..139.779 rows=85668 loops=1)
Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 4036kB
-> Hash Join (cost=4562.41..6171.78 rows=1 width=4) (actual time=92.471..125.417 rows=85668 loops=1)
Hash Cond: (cem.invoice_line_id = il.id)
-> Seq Scan on onp_crm_calendarentry_invoice_membership cem (cost=0.00..1278.44 rows=88244 width=8) (actual time=0.005..7.941 rows=88734 loops=1)
-> Hash (cost=4562.40..4562.40 rows=1 width=4) (actual time=92.444..92.444 rows=75570 loops=1)
Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 3681kB
-> Hash Join (cost=4008.74..4562.40 rows=1 width=4) (actual time=61.797..79.981 rows=75570 loops=1)
Hash Cond: (inv.entity_id = il.invoice_id)
-> Seq Scan on onp_crm_invoice inv (cost=0.00..510.56 rows=11492 width=8) (actual time=0.027..4.489 rows=11507 loops=1)
Filter: ((sent_date <= '2016-06-27'::date) AND ((status_key)::text = 'INVOICE_STATUS_INVOICED'::text))
Rows Removed by Filter: 151
-> Hash (cost=4008.72..4008.72 rows=1 width=8) (actual time=61.751..61.751 rows=76182 loops=1)
Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 4000kB
-> Hash Anti Join (cost=654.21..4008.72 rows=1 width=8) (actual time=12.568..47.911 rows=76182 loops=1)
Hash Cond: (il.invoice_id = creditnote.credit_against)
-> Seq Scan on onp_crm_invoice_line il (cost=0.00..3062.01 rows=78001 width=8) (actual time=0.008..12.476 rows=78622 loops=1)
-> Hash (cost=510.56..510.56 rows=11492 width=8) (actual time=12.477..12.477 rows=372 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 143kB
-> Seq Scan on onp_crm_invoice creditnote (cost=0.00..510.56 rows=11492 width=8) (actual time=0.008..10.963 rows=11507 loops=1)
Filter: ((sent_date <= '2016-06-27'::date) AND ((status_key)::text = 'INVOICE_STATUS_INVOICED'::text))
Rows Removed by Filter: 151
Planning time: 3.510 ms
Execution time: 338.349 ms
(39 rows)
So my question is is there something I can do to make PG favor a Hash Anti Join instead of a Nested Loop Anti Join (which I assume is the problem)?
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=15936.07..15953.83 rows=1421 width=12) (actual time=337.878..338.045 rows=724 loops=1)
Group Key: log.relation_id
-> Hash Anti Join (cost=6258.35..15926.07 rows=2000 width=12) (actual time=139.976..317.402 rows=96097 loops=1)
Hash Cond: (log.id = cem.calendar_entry_id)
-> Hash Join (cost=86.56..9729.03 rows=2000 width=20) (actual time=0.164..132.935 rows=181646 loops=1)
Hash Cond: (log.logged_for = logfor.onp_user_id)
-> Seq Scan on onp_crm_activity_log log (cost=0.00..8930.98 rows=184398 width=24) (actual time=0.006..89.170 rows=182380 loops=1)
-> Hash (cost=39.46..39.46 rows=3768 width=8) (actual time=0.118..0.118 rows=36 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 34kB
-> Bitmap Heap Scan on onp_crm_person logfor (cost=3.69..39.46 rows=3768 width=8) (actual time=0.037..0.101 rows=36 loops=1)
Recheck Cond: (onp_user_id IS NOT NULL)
Filter: (NOT is_resource)
Rows Removed by Filter: 5
Heap Blocks: exact=10
-> Bitmap Index Scan on onp_crm_person_onp_id_idx (cost=0.00..2.75 rows=41 width=0) (actual time=0.017..0.017 rows=41 loops=1)
-> Hash (cost=6171.78..6171.78 rows=1 width=4) (actual time=139.779..139.779 rows=85668 loops=1)
Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 4036kB
-> Hash Join (cost=4562.41..6171.78 rows=1 width=4) (actual time=92.471..125.417 rows=85668 loops=1)
Hash Cond: (cem.invoice_line_id = il.id)
-> Seq Scan on onp_crm_calendarentry_invoice_membership cem (cost=0.00..1278.44 rows=88244 width=8) (actual time=0.005..7.941 rows=88734 loops=1)
-> Hash (cost=4562.40..4562.40 rows=1 width=4) (actual time=92.444..92.444 rows=75570 loops=1)
Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 3681kB
-> Hash Join (cost=4008.74..4562.40 rows=1 width=4) (actual time=61.797..79.981 rows=75570 loops=1)
Hash Cond: (inv.entity_id = il.invoice_id)
-> Seq Scan on onp_crm_invoice inv (cost=0.00..510.56 rows=11492 width=8) (actual time=0.027..4.489 rows=11507 loops=1)
Filter: ((sent_date <= '2016-06-27'::date) AND ((status_key)::text = 'INVOICE_STATUS_INVOICED'::text))
Rows Removed by Filter: 151
-> Hash (cost=4008.72..4008.72 rows=1 width=8) (actual time=61.751..61.751 rows=76182 loops=1)
Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 4000kB
-> Hash Anti Join (cost=654.21..4008.72 rows=1 width=8) (actual time=12.568..47.911 rows=76182 loops=1)
Hash Cond: (il.invoice_id = creditnote.credit_against)
-> Seq Scan on onp_crm_invoice_line il (cost=0.00..3062.01 rows=78001 width=8) (actual time=0.008..12.476 rows=78622 loops=1)
-> Hash (cost=510.56..510.56 rows=11492 width=8) (actual time=12.477..12.477 rows=372 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 143kB
-> Seq Scan on onp_crm_invoice creditnote (cost=0.00..510.56 rows=11492 width=8) (actual time=0.008..10.963 rows=11507 loops=1)
Filter: ((sent_date <= '2016-06-27'::date) AND ((status_key)::text = 'INVOICE_STATUS_INVOICED'::text))
Rows Removed by Filter: 151
Planning time: 3.510 ms
Execution time: 338.349 ms
(39 rows)
So my question is is there something I can do to make PG favor a Hash Anti Join instead of a Nested Loop Anti Join (which I assume is the problem)?
Can the nested NOT EXISTS be re-written to be more performant?
Thanks.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963