On 18 January 2016 at 10:41, Adam Brusselback <adambrusselback@xxxxxxxxx> wrote:
Hey all, i've run into a performance problem with one of my queries that I am really not sure what is causing it.Setup info:Postgres version 9.4.4 on Debian 7. Server is virtual, with a single core and 512 ram available and ssd storage.Changes to postgresql.conf:maintenance_work_mem = 30MBcheckpoint_completion_target = 0.7effective_cache_size = 352MBwork_mem = 24MBwal_buffers = 4MBcheckpoint_segments = 8shared_buffers = 120MBrandom_page_cost = 1.1The problem:I have a view which sums up detail records to give totals at a header level, and performance is great when I select from it by limiting to a single record, but limiting it to multiple records seems to cause it to choose a bad plan.Example 1:SELECT *
FROM claim_totals
WHERE claim_id IN ('e8a38718-7997-4304-bbfa-138deb84aa82')
(2 ms)Example 2:SELECT *
FROM claim_totals
WHERE claim_id IN ('324d2af8-46b3-45ad-b56a-0a49d0345653', 'e8a38718-7997-4304-bbfa-138deb84aa82')
(5460 ms)The view definition is:SELECT claim.claim_id,
COALESCE(lumpsum.lumpsum_count, 0::bigint)::integer AS lumpsum_count,
COALESCE(lumpsum.requested_amount, 0::numeric) AS lumpsum_requested_total,
COALESCE(lumpsum.allowed_amount, 0::numeric) AS lumpsum_allowed_total,
COALESCE(claim_product.product_count, 0::bigint)::integer + COALESCE(claim_adhoc_product.adhoc_product_count, 0::bigint)::integer AS product_count,
COALESCE(claim_product.requested_amount, 0::numeric) + COALESCE(claim_adhoc_product.requested_amount, 0::numeric) AS product_requested_amount,
COALESCE(claim_product.allowed_amount, 0::numeric) + COALESCE(claim_adhoc_product.allowed_amount, 0::numeric) AS product_allowed_amount,
COALESCE(claim_product.requested_amount, 0::numeric) + COALESCE(claim_adhoc_product.requested_amount, 0::numeric) + COALESCE(lumpsum.requested_amount, 0::numeric) AS requested_total,
COALESCE(claim_product.allowed_amount, 0::numeric) + COALESCE(claim_adhoc_product.allowed_amount, 0::numeric) + COALESCE(lumpsum.allowed_amount, 0::numeric) AS allowed_total
FROM claim
LEFT JOIN ( SELECT claim_lumpsum.claim_id,
count(claim_lumpsum.claim_lumpsum_id) AS lumpsum_count,
sum(claim_lumpsum.requested_amount) AS requested_amount,
sum(claim_lumpsum.allowed_amount) AS allowed_amount
FROM claim_lumpsum
GROUP BY claim_lumpsum.claim_id) lumpsum ON lumpsum.claim_id = claim.claim_id
LEFT JOIN ( SELECT claim_product_1.claim_id,
count(claim_product_1.claim_product_id) AS product_count,
sum(claim_product_1.rebate_requested_quantity * claim_product_1.rebate_requested_rate) AS requested_amount,
sum(claim_product_1.rebate_allowed_quantity * claim_product_1.rebate_allowed_rate) AS allowed_amount
FROM claim_product claim_product_1
GROUP BY claim_product_1.claim_id) claim_product ON claim_product.claim_id = claim.claim_id
LEFT JOIN ( SELECT claim_adhoc_product_1.claim_id,
count(claim_adhoc_product_1.claim_adhoc_product_id) AS adhoc_product_count,
sum(claim_adhoc_product_1.rebate_requested_quantity * claim_adhoc_product_1.rebate_requested_rate) AS requested_amount,
sum(claim_adhoc_product_1.rebate_allowed_quantity * claim_adhoc_product_1.rebate_allowed_rate) AS allowed_amount
FROM claim_adhoc_product claim_adhoc_product_1
GROUP BY claim_adhoc_product_1.claim_id) claim_adhoc_product ON claim_adhoc_product.claim_id = claim.claim_id;Here are the respective explain / analyze for the two queries above:Example 1:Nested Loop Left Join (cost=0.97..149.46 rows=2 width=232) (actual time=0.285..0.289 rows=1 loops=1)
Output: claim.claim_id, (COALESCE((count(claim_lumpsum.claim_lumpsum_id)), 0::bigint))::integer, COALESCE((sum(claim_lumpsum.requested_amount)), 0::numeric), COALESCE((sum(claim_lumpsum.allowed_amount)), 0::numeric), ((COALESCE((count(claim_product_1.claim_product_id)), 0::bigint))::integer + (COALESCE((count(claim_adhoc_product_1.claim_adhoc_product_id)), 0::bigint))::integer), (COALESCE((sum((claim_product_1.rebate_requested_quantity * claim_product_1.rebate_requested_rate))), 0::numeric) + COALESCE((sum((claim_adhoc_product_1.rebate_requested_quantity * claim_adhoc_product_1.rebate_requested_rate))), 0::numeric)), (COALESCE((sum((claim_product_1.rebate_allowed_quantity * claim_product_1.rebate_allowed_rate))), 0::numeric) + COALESCE((sum((claim_adhoc_product_1.rebate_allowed_quantity * claim_adhoc_product_1.rebate_allowed_rate))), 0::numeric)), ((COALESCE((sum((claim_product_1.rebate_requested_quantity * claim_product_1.rebate_requested_rate))), 0::numeric) + COALESCE((sum((claim_adhoc_product_1.rebate_requested_quantity * claim_adhoc_product_1.rebate_requested_rate))), 0::numeric)) + COALESCE((sum(claim_lumpsum.requested_amount)), 0::numeric)), ((COALESCE((sum((claim_product_1.rebate_allowed_quantity * claim_product_1.rebate_allowed_rate))), 0::numeric) + COALESCE((sum((claim_adhoc_product_1.rebate_allowed_quantity * claim_adhoc_product_1.rebate_allowed_rate))), 0::numeric)) + COALESCE((sum(claim_lumpsum.allowed_amount)), 0::numeric))
Join Filter: (claim_lumpsum.claim_id = claim.claim_id)
-> Nested Loop Left Join (cost=0.97..135.31 rows=1 width=160) (actual time=0.260..0.264 rows=1 loops=1)
Output: claim.claim_id, (count(claim_product_1.claim_product_id)), (sum((claim_product_1.rebate_requested_quantity * claim_product_1.rebate_requested_rate))), (sum((claim_product_1.rebate_allowed_quantity * claim_product_1.rebate_allowed_rate))), (count(claim_adhoc_product_1.claim_adhoc_product_id)), (sum((claim_adhoc_product_1.rebate_requested_quantity * claim_adhoc_product_1.rebate_requested_rate))), (sum((claim_adhoc_product_1.rebate_allowed_quantity * claim_adhoc_product_1.rebate_allowed_rate)))
Join Filter: (claim_adhoc_product_1.claim_id = claim.claim_id)
-> Nested Loop Left Join (cost=0.97..122.14 rows=1 width=88) (actual time=0.254..0.256 rows=1 loops=1)
Output: claim.claim_id, (count(claim_product_1.claim_product_id)), (sum((claim_product_1.rebate_requested_quantity * claim_product_1.rebate_requested_rate))), (sum((claim_product_1.rebate_allowed_quantity * claim_product_1.rebate_allowed_rate)))
Join Filter: (claim_product_1.claim_id = claim.claim_id)
-> Index Only Scan using claim_pkey on client_pinnacle.claim (cost=0.42..1.54 rows=1 width=16) (actual time=0.078..0.079 rows=1 loops=1)
Output: claim.claim_id
Index Cond: (claim.claim_id = 'e8a38718-7997-4304-bbfa-138deb84aa82'::uuid)
Heap Fetches: 0
-> GroupAggregate (cost=0.55..120.58 rows=1 width=54) (actual time=0.163..0.163 rows=1 loops=1)
Output: claim_product_1.claim_id, count(claim_product_1.claim_product_id), sum((claim_product_1.rebate_requested_quantity * claim_product_1.rebate_requested_rate)), sum((claim_product_1.rebate_allowed_quantity * claim_product_1.rebate_allowed_rate))
Group Key: claim_product_1.claim_id
-> Index Scan using claim_product_claim_id_product_id_distributor_company_id_lo_key on client_pinnacle.claim_product claim_product_1 (cost=0.55..118.99 rows=105 width=54) (actual time=0.071..0.091 rows=12 loops=1)
Output: claim_product_1.claim_id, claim_product_1.claim_product_id, claim_product_1.rebate_requested_quantity, claim_product_1.rebate_requested_rate, claim_product_1.rebate_allowed_quantity, claim_product_1.rebate_allowed_rate
Index Cond: (claim_product_1.claim_id = 'e8a38718-7997-4304-bbfa-138deb84aa82'::uuid)
-> GroupAggregate (cost=0.00..13.15 rows=1 width=160) (actual time=0.001..0.001 rows=0 loops=1)
Output: claim_adhoc_product_1.claim_id, count(claim_adhoc_product_1.claim_adhoc_product_id), sum((claim_adhoc_product_1.rebate_requested_quantity * claim_adhoc_product_1.rebate_requested_rate)), sum((claim_adhoc_product_1.rebate_allowed_quantity * claim_adhoc_product_1.rebate_allowed_rate))
Group Key: claim_adhoc_product_1.claim_id
-> Seq Scan on client_pinnacle.claim_adhoc_product claim_adhoc_product_1 (cost=0.00..13.12 rows=1 width=160) (actual time=0.001..0.001 rows=0 loops=1)
Output: claim_adhoc_product_1.claim_adhoc_product_id, claim_adhoc_product_1.claim_id, claim_adhoc_product_1.product_name, claim_adhoc_product_1.product_number, claim_adhoc_product_1.uom_type_description, claim_adhoc_product_1.rebate_requested_quantity, claim_adhoc_product_1.rebate_requested_rate, claim_adhoc_product_1.rebate_allowed_quantity, claim_adhoc_product_1.rebate_allowed_rate, claim_adhoc_product_1.claimant_contract_name, claim_adhoc_product_1.resolve_date
Filter: (claim_adhoc_product_1.claim_id = 'e8a38718-7997-4304-bbfa-138deb84aa82'::uuid)
-> GroupAggregate (cost=0.00..14.05 rows=2 width=96) (actual time=0.001..0.001 rows=0 loops=1)
Output: claim_lumpsum.claim_id, count(claim_lumpsum.claim_lumpsum_id), sum(claim_lumpsum.requested_amount), sum(claim_lumpsum.allowed_amount)
Group Key: claim_lumpsum.claim_id
-> Seq Scan on client_pinnacle.claim_lumpsum (cost=0.00..14.00 rows=2 width=96) (actual time=0.000..0.000 rows=0 loops=1)
Output: claim_lumpsum.claim_lumpsum_id, claim_lumpsum.claim_id, claim_lumpsum.lumpsum_id, claim_lumpsum.requested_amount, claim_lumpsum.allowed_amount, claim_lumpsum.event_date_range, claim_lumpsum.contract_lumpsum_id, claim_lumpsum.claimant_contract_name, claim_lumpsum.hint_contract_lumpsum_description
Filter: (claim_lumpsum.claim_id = 'e8a38718-7997-4304-bbfa-138deb84aa82'::uuid)
Planning time: 6.336 ms
Execution time: 0.753 msExample 2:Hash Right Join (cost=81278.79..81674.85 rows=2 width=232) (actual time=5195.972..5458.916 rows=2 loops=1)
Output: claim.claim_id, (COALESCE((count(claim_lumpsum.claim_lumpsum_id)), 0::bigint))::integer, COALESCE((sum(claim_lumpsum.requested_amount)), 0::numeric), COALESCE((sum(claim_lumpsum.allowed_amount)), 0::numeric), ((COALESCE((count(claim_product_1.claim_product_id)), 0::bigint))::integer + (COALESCE((count(claim_adhoc_product_1.claim_adhoc_product_id)), 0::bigint))::integer), (COALESCE((sum((claim_product_1.rebate_requested_quantity * claim_product_1.rebate_requested_rate))), 0::numeric) + COALESCE((sum((claim_adhoc_product_1.rebate_requested_quantity * claim_adhoc_product_1.rebate_requested_rate))), 0::numeric)), (COALESCE((sum((claim_product_1.rebate_allowed_quantity * claim_product_1.rebate_allowed_rate))), 0::numeric) + COALESCE((sum((claim_adhoc_product_1.rebate_allowed_quantity * claim_adhoc_product_1.rebate_allowed_rate))), 0::numeric)), ((COALESCE((sum((claim_product_1.rebate_requested_quantity * claim_product_1.rebate_requested_rate))), 0::numeric) + COALESCE((sum((claim_adhoc_product_1.rebate_requested_quantity * claim_adhoc_product_1.rebate_requested_rate))), 0::numeric)) + COALESCE((sum(claim_lumpsum.requested_amount)), 0::numeric)), ((COALESCE((sum((claim_product_1.rebate_allowed_quantity * claim_product_1.rebate_allowed_rate))), 0::numeric) + COALESCE((sum((claim_adhoc_product_1.rebate_allowed_quantity * claim_adhoc_product_1.rebate_allowed_rate))), 0::numeric)) + COALESCE((sum(claim_lumpsum.allowed_amount)), 0::numeric))
Hash Cond: (claim_product_1.claim_id = claim.claim_id)
-> HashAggregate (cost=81231.48..81438.09 rows=13774 width=54) (actual time=5182.546..5405.990 rows=95763 loops=1)
Output: claim_product_1.claim_id, count(claim_product_1.claim_product_id), sum((claim_product_1.rebate_requested_quantity * claim_product_1.rebate_requested_rate)), sum((claim_product_1.rebate_allowed_quantity * claim_product_1.rebate_allowed_rate))
Group Key: claim_product_1.claim_id
-> Seq Scan on client_pinnacle.claim_product claim_product_1 (cost=0.00..55253.59 rows=1731859 width=54) (actual time=0.020..1684.826 rows=1731733 loops=1)
Output: claim_product_1.claim_id, claim_product_1.claim_product_id, claim_product_1.rebate_requested_quantity, claim_product_1.rebate_requested_rate, claim_product_1.rebate_allowed_quantity, claim_product_1.rebate_allowed_rate
-> Hash (cost=47.29..47.29 rows=2 width=160) (actual time=0.110..0.110 rows=2 loops=1)
Output: claim.claim_id, (count(claim_lumpsum.claim_lumpsum_id)), (sum(claim_lumpsum.requested_amount)), (sum(claim_lumpsum.allowed_amount)), (count(claim_adhoc_product_1.claim_adhoc_product_id)), (sum((claim_adhoc_product_1.rebate_requested_quantity * claim_adhoc_product_1.rebate_requested_rate))), (sum((claim_adhoc_product_1.rebate_allowed_quantity * claim_adhoc_product_1.rebate_allowed_rate)))
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Hash Right Join (cost=41.53..47.29 rows=2 width=160) (actual time=0.105..0.108 rows=2 loops=1)
Output: claim.claim_id, (count(claim_lumpsum.claim_lumpsum_id)), (sum(claim_lumpsum.requested_amount)), (sum(claim_lumpsum.allowed_amount)), (count(claim_adhoc_product_1.claim_adhoc_product_id)), (sum((claim_adhoc_product_1.rebate_requested_quantity * claim_adhoc_product_1.rebate_requested_rate))), (sum((claim_adhoc_product_1.rebate_allowed_quantity * claim_adhoc_product_1.rebate_allowed_rate)))
Hash Cond: (claim_adhoc_product_1.claim_id = claim.claim_id)
-> HashAggregate (cost=16.25..19.25 rows=200 width=160) (actual time=0.001..0.001 rows=0 loops=1)
Output: claim_adhoc_product_1.claim_id, count(claim_adhoc_product_1.claim_adhoc_product_id), sum((claim_adhoc_product_1.rebate_requested_quantity * claim_adhoc_product_1.rebate_requested_rate)), sum((claim_adhoc_product_1.rebate_allowed_quantity * claim_adhoc_product_1.rebate_allowed_rate))
Group Key: claim_adhoc_product_1.claim_id
-> Seq Scan on client_pinnacle.claim_adhoc_product claim_adhoc_product_1 (cost=0.00..12.50 rows=250 width=160) (actual time=0.001..0.001 rows=0 loops=1)
Output: claim_adhoc_product_1.claim_adhoc_product_id, claim_adhoc_product_1.claim_id, claim_adhoc_product_1.product_name, claim_adhoc_product_1.product_number, claim_adhoc_product_1.uom_type_description, claim_adhoc_product_1.rebate_requested_quantity, claim_adhoc_product_1.rebate_requested_rate, claim_adhoc_product_1.rebate_allowed_quantity, claim_adhoc_product_1.rebate_allowed_rate, claim_adhoc_product_1.claimant_contract_name, claim_adhoc_product_1.resolve_date
-> Hash (cost=25.25..25.25 rows=2 width=88) (actual time=0.093..0.093 rows=2 loops=1)
Output: claim.claim_id, (count(claim_lumpsum.claim_lumpsum_id)), (sum(claim_lumpsum.requested_amount)), (sum(claim_lumpsum.allowed_amount))
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Hash Right Join (cost=19.49..25.25 rows=2 width=88) (actual time=0.088..0.092 rows=2 loops=1)
Output: claim.claim_id, (count(claim_lumpsum.claim_lumpsum_id)), (sum(claim_lumpsum.requested_amount)), (sum(claim_lumpsum.allowed_amount))
Hash Cond: (claim_lumpsum.claim_id = claim.claim_id)
-> HashAggregate (cost=16.40..19.40 rows=200 width=96) (actual time=0.003..0.003 rows=0 loops=1)
Output: claim_lumpsum.claim_id, count(claim_lumpsum.claim_lumpsum_id), sum(claim_lumpsum.requested_amount), sum(claim_lumpsum.allowed_amount)
Group Key: claim_lumpsum.claim_id
-> Seq Scan on client_pinnacle.claim_lumpsum (cost=0.00..13.20 rows=320 width=96) (actual time=0.001..0.001 rows=0 loops=1)
Output: claim_lumpsum.claim_lumpsum_id, claim_lumpsum.claim_id, claim_lumpsum.lumpsum_id, claim_lumpsum.requested_amount, claim_lumpsum.allowed_amount, claim_lumpsum.event_date_range, claim_lumpsum.contract_lumpsum_id, claim_lumpsum.claimant_contract_name, claim_lumpsum.hint_contract_lumpsum_description
-> Hash (cost=3.07..3.07 rows=2 width=16) (actual time=0.073..0.073 rows=2 loops=1)
Output: claim.claim_id
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Index Only Scan using claim_pkey on client_pinnacle.claim (cost=0.42..3.07 rows=2 width=16) (actual time=0.048..0.070 rows=2 loops=1)
Output: claim.claim_id
Index Cond: (claim.claim_id = ANY ('{324d2af8-46b3-45ad-b56a-0a49d0345653,e8a38718-7997-4304-bbfa-138deb84aa82}'::uuid[]))
Heap Fetches: 0
Planning time: 1.020 ms
Execution time: 5459.461 msPlease let me know if there is any more info I can provide to help figure out why it's choosing an undesirable plan with just a slight change in the the clause.
Hi Adam,
This is fairly simple to explain. The reason you see better performance with the singe claim_id is that IN() clauses with a single 1 item are converted to a single equality _expression_. For example: (just using system tables so you can try this too, without having to create any special tables)
# explain select * from pg_class where oid in(1);
QUERY PLAN
-------------------------------------------------------------------------------------
Index Scan using pg_class_oid_index on pg_class (cost=0.27..8.29 rows=1 width=219)
Index Cond: (oid = '1'::oid)
We get an index scan with the index condition: oid = 1.
If we have 2 items, then we don't get this.
# explain select * from pg_class where oid in(1,2);
QUERY PLAN
---------------------------------------------------------------------------------
Bitmap Heap Scan on pg_class (cost=8.56..14.03 rows=2 width=219)
Recheck Cond: (oid = ANY ('{1,2}'::oid[]))
-> Bitmap Index Scan on pg_class_oid_index (cost=0.00..8.56 rows=2 width=0)
Index Cond: (oid = ANY ('{1,2}'::oid[]))
(4 rows)
Now I also need to explain that PostgreSQL will currently push ONLY equality expressions into other relations. For example, if we write:
# explain select * from pg_class pc inner join pg_attribute pa on pc.oid = pa.attrelid where pc.oid in(1);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.55..22.63 rows=4 width=422)
-> Index Scan using pg_class_oid_index on pg_class pc (cost=0.27..8.29 rows=1 width=223)
Index Cond: (oid = '1'::oid)
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute pa (cost=0.28..14.30 rows=4 width=203)
Index Cond: (attrelid = '1'::oid)
(5 rows)
You can see that I only put pg_class.oid = 1 in the query, but internally the query planner also added the pg_attribute.attrelid = 1. It was able to do this due to the join condition dictating that pc.oid = pa.attrelid, therefore this will always be equal, and since pc.oid = 1, then pa.attrelid must also be 1.
If we have 2 items in the IN() clause, then this no longer happens:
# explain select * from pg_class pc inner join pg_attribute pa on pc.oid = pa.attrelid where pc.oid in(1,2);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Nested Loop (cost=8.84..54.84 rows=15 width=422)
-> Bitmap Heap Scan on pg_class pc (cost=8.56..14.03 rows=2 width=223)
Recheck Cond: (oid = ANY ('{1,2}'::oid[]))
-> Bitmap Index Scan on pg_class_oid_index (cost=0.00..8.56 rows=2 width=0)
Index Cond: (oid = ANY ('{1,2}'::oid[]))
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute pa (cost=0.28..20.33 rows=8 width=203)
Index Cond: (attrelid = pc.oid)
(7 rows)
In your case the claim_id = 'e8a38718-7997-4304-bbfa-138deb84aa82'::uuid was pushed down into the subqueries, thus giving them less work to do, and also the flexibility of using indexes on claim_id in the tables contained within the subqueries. PostgreSQL currently does not push any inequality predicates down at all.
A few months ago I did a little bit of work to try and lift this restriction, although I only made it cover the >=, >, < and <= operators as a first measure.
Details here:
If you didn't have the VIEW, you could manually push these predicates into each subquery. However this is not really possible to do with the VIEW. Perhaps something could be done with a function and using dynamic SQL to craft a query manually, or you could just get rid of the view and have the application build the query. If that's not an option then maybe you could response to the thread above to mention that you've been hit by this problem and would +1 some solution to fix it, and perhaps cross link to this thread. I did have a little bit of a hard time in convincing people that this was in fact a fairly common problem in the above thread, so it would be nice to see people who have hit this problem respond to that.