Andres, Andrew, and Tom:
Thanks for your insightful comments! We conducted additional analysis based on your comments and would like to share the results. We would also like to get your feedback on a few design decisions to increase the utility of our performance regression reports.
####### QUERY 2:
As Andres and Andrew correctly pointed out, this regression is not observed after we increase the work_mem parameter.
* work_mem = 256MB
* shared_buffers = 1024MB
* temp_buffers = 64MB
We wanted to check if these settings look good to you. If so, we will use them for validating the regressions identified by sqlfuzz.
####### QUERY 3:
Surprisingly, the performance impact of this regression is more prominent on larger databases. We concur with Andrew in that this might be related to the lack of a Materialize node and mis-costing of the Nested Loop Anti-Join.
When we increased the scale-factor of TPC-C to 300 (~30 GB), this query ran three times slower on v11 (24 seconds) in comparison to v9.5 (7 seconds). We also found more than 15 regressions related to the same commit and share a couple of them below.
Commit: 77cd477 (Enable parallel query by default.)
Summary: Execution Time (milliseconds)
+----------------------+--------+---------+---------+-----------+
| | scale1 | scale10 | scale50 | scale 300 |
+----------------------+--------+---------+---------+-----------+
| Case-3 (v9.5) | 28 | 248 | 1231 | 7265 |
| Case-3 (v11) | 74 | 677 | 3345 | 24581 |
+----------------------+--------+---------+---------+-----------+
| Case-3A (v9.5) | 88 | 937 | 4721 | 27241 |
| Case-3A (v11) | 288 | 2822 | 13838 | 85081 |
+----------------------+--------+---------+---------+-----------+
| Case-3B (v9.5) | 101 | 934 | 4824 | 29363 |
| Case-3B (v11) | 200 | 2331 | 12327 | 74110 |
+----------------------+--------+---------+---------+-----------+
###### QUERY 3A:
select
ref_0.ol_delivery_d as c1
from
public.order_line as ref_0
where EXISTS (
select
ref_1.i_im_id as c0
from
public.item as ref_1
where ref_0.ol_d_id <= ref_1.i_im_id
)
Execution plan:
[old]
Nested Loop Semi Join (cost=0.00..90020417940.08 rows=30005835 width=8) (actual time=0.034..24981.895 rows=90017507 loops=1)
Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
-> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504 width=12) (actual time=0.022..7145.811 rows=90017507 loops=1)
-> Materialize (cost=0.00..2771.00 rows=100000 width=4) (actual time=0.000..0.000 rows=1 loops=90017507)
-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.006..0.006 rows=1 loops=1)
Planning time: 0.290 ms
Execution time: 27241.239 ms
[new]
Gather (cost=1000.00..88047487498.82 rows=30005835 width=8) (actual time=0.265..82355.289 rows=90017507 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Semi Join (cost=0.00..88044485915.32 rows=12502431 width=8) (actual time=0.033..68529.259 rows=30005836 loops=3)
Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
-> Parallel Seq Scan on order_line ref_0 (cost=0.00..1486400.93 rows=37507293 width=12) (actual time=0.023..2789.901 rows=30005836 loops=3)
-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.001..0.001 rows=1 loops=90017507)
Planning Time: 0.319 ms
Execution Time: 85081.158 ms
###### QUERY 3B:
select
ref_0.ol_i_id as c0
from
public.order_line as ref_0
where EXISTS (
select
ref_0.ol_delivery_d as c0
from
public.order_line as ref_1
where ref_1.ol_d_id <= cast(nullif(ref_1.ol_o_id, ref_0.ol_i_id) as int4))
Execution plan:
[old]
Nested Loop Semi Join (cost=0.00..115638730740936.53 rows=30005835 width=4) (actual time=0.017..27009.302 rows=90017507 loops=1)
Join Filter: (ref_1.ol_d_id <= NULLIF(ref_1.ol_o_id, ref_0.ol_i_id))
Rows Removed by Join Filter: 11557
-> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504 width=4) (actual time=0.009..7199.540 rows=90017507 loops=1)
-> Materialize (cost=0.00..2813221.56 rows=90017504 width=8) (actual time=0.000..0.000 rows=1 loops=90017507)
-> Seq Scan on order_line ref_1 (cost=0.00..2011503.04 rows=90017504 width=8) (actual time=0.001..0.002 rows=14 loops=1)
Planning time: 0.252 ms
Execution time: 29363.737 ms
[new]
Gather (cost=1000.00..84060490326155.39 rows=30005835 width=4) (actual time=0.272..71712.491 rows=90017507 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Semi Join (cost=0.00..84060487324571.89 rows=12502431 width=4) (actual time=0.046..60153.472 rows=30005836 loops=3)
Join Filter: (ref_1.ol_d_id <= NULLIF(ref_1.ol_o_id, ref_0.ol_i_id))
Rows Removed by Join Filter: 1717
-> Parallel Seq Scan on order_line ref_0 (cost=0.00..1486400.93 rows=37507293 width=4) (actual time=0.023..2819.361 rows=30005836 loops=3)
-> Seq Scan on order_line ref_1 (cost=0.00..2011503.04 rows=90017504 width=8) (actual time=0.001..0.001 rows=1 loops=90017507)
Planning Time: 0.334 ms
Execution Time: 74110.942 ms
####### QUERY 4:
As Tom pointed out, this regression is not present in DEV head. We will not report regressions related to NULLIF in the future.
####### QUERY 1:
This regression is also not present in DEV head. We will validate our regressions on DEV head in the future report.
Jinho Jung
From: Tom Lane <tgl@xxxxxxxxxxxxx>
Sent: Saturday, February 16, 2019 5:37:49 PM To: Andres Freund Cc: Jung, Jinho; Jeff Janes; pgsql-performance@xxxxxxxxxxxxxx Subject: Re: Performance regressions found using sqlfuzz Andres Freund <andres@xxxxxxxxxxx> writes:
> On 2019-02-14 17:27:40 +0000, Jung, Jinho wrote: >> - Our analysis: We believe that this regression has to do with two factors: 1) conditional _expression_ (e.g., LEAST or NULLIF) are not reduced to constants unlike string functions (e.g., CHAR_LENGTH) 2) change in the cost estimation function for bitmap scan. Execution time grows by 3 orders of magnitude. We note that this regression is only observed on large databases (e.g., scale factor of 50). > Hm. The primary problem here is that the estimation both before and > after are really bad. So I don't think the commit you point out here is > really to blame. I'm not that bothered by the query not being great, > given the weird construction with LEAST(), but we probably could fix > that pretty easily. We already did: Author: Tom Lane <tgl@xxxxxxxxxxxxx> Branch: master [6f19a8c41] 2018-12-30 13:42:04 -0500 Teach eval_const_expressions to constant-fold LEAST/GREATEST expressions. Doing this requires an assumption that the invoked btree comparison function is immutable. We could check that explicitly, but in other places such as contain_mutable_functions we just assume that it's true, so we may as well do likewise here. (If the comparison function's behavior isn't immutable, the sort order in indexes built with it would be unstable, so it seems certainly wrong for it not to be so.) Vik Fearing Discussion: https://postgr.es/m/c6e8504c-4c43-35fa-6c8f-3c0b80a912cc@xxxxxxxxxxxxxxx BTW, const-folding NULLIF would not be a similarly tiny fix, because it would need to check for immutability of the underlying operator (since it is possibly a cross-type comparison, we can't get away with just assuming it's immutable). I'm not convinced that case is worth carrying extra code for. regards, tom lane |