Dear PostgreSQL Community,
For the query 10 in TPC-H benchmark:
select
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) as revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
from
CUSTOMER,
ORDERS,
LINEITEM,
NATION
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate >= date '1993-08-01'
and o_orderdate < date '1993-08-01' + interval '3' month
and l_returnflag = 'R'
and c_nationkey = n_nationkey
group by
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
order by
revenue desc
limit
20;
Its query plan is:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=200479.71..200479.76 rows=20 width=205) (actual time=506.558..510.594 rows=20 loops=1)
-> Sort (cost=200479.71..200622.37 rows=57064 width=205) (actual time=506.557..510.591 rows=20 loops=1)
Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric - lineitem.l_discount)))) DESC
Sort Method: top-N heapsort Memory: 34kB
-> Finalize GroupAggregate (cost=191629.63..198961.25 rows=57064 width=205) (actual time=441.132..501.986 rows=37925 loops=1)
Group Key: customer.c_custkey, nation.n_name
-> Gather Merge (cost=191629.63..197772.41 rows=47554 width=205) (actual time=441.124..474.623 rows=37925 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=190629.61..191283.48 rows=23777 width=205) (actual time=437.497..464.923 rows=12642 loops=3)
Group Key: customer.c_custkey, nation.n_name
-> Sort (cost=190629.61..190689.05 rows=23777 width=185) (actual time=437.485..441.339 rows=38183 loops=3)
Sort Key: customer.c_custkey, nation.n_name
Sort Method: external merge Disk: 7184kB
Worker 0: Sort Method: external merge Disk: 7448kB
Worker 1: Sort Method: external merge Disk: 7264kB
-> Hash Join (cost=181606.66..186706.85 rows=23777 width=185) (actual time=385.555..418.269 rows=38183 loops=3)
Hash Cond: (customer.c_nationkey = nation.n_nationkey)
-> Parallel Hash Join (cost=181605.09..186632.29 rows=23777 width=160) (actual time=385.484..411.936 rows=38183 loops=3)
Hash Cond: (customer.c_custkey = orders.o_custkey)
-> Parallel Seq Scan on customer (cost=0.00..4225.00 rows=62500 width=148) (actual time=0.028..9.805 rows=50000 loops=3)
-> Parallel Hash (cost=181307.88..181307.88 rows=23777 width=16) (actual time=385.060..385.063 rows=38183 loops=3)
Buckets: 131072 (originally 65536) Batches: 1 (originally 1) Memory Usage: 7648kB
-> Parallel Hash Join (cost=35809.22..181307.88 rows=23777 width=16) (actual time=69.608..371.381 rows=38183 loops=3)
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Parallel Seq Scan on lineitem (cost=0.00..143863.66 rows=622855 width=16) (actual time=0.024..255.818 rows=492957 loops=3)
Filter: (l_returnflag = 'R'::bpchar)
Rows Removed by Filter: 1507448
-> Parallel Hash (cost=35511.00..35511.00 rows=23858 width=8) (actual time=68.857..68.858 rows=19046 loops=3)
Buckets: 65536 Batches: 1 Memory Usage: 2816kB
-> Parallel Seq Scan on orders (cost=0.00..35511.00 rows=23858 width=8) (actual time=0.033..62.907 rows=19046 loops=3)
Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < '1993-11-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 480954
-> Hash (cost=1.25..1.25 rows=25 width=33) (actual time=0.037..0.037 rows=25 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=33) (actual time=0.020..0.024 rows=25 loops=3)
Planning Time: 2.295 ms
Execution Time: 512.015 ms
(38 rows)
While if I apply this patch to disable the simplify_function():
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b4e085e9d4..155bbd9fbc 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2636,15 +2636,6 @@ eval_const_expressions_mutator(Node *node,
* Code for op/func reduction is pretty bulky, so split it out
* as a separate function.
*/
- simple = simplify_function(expr->opfuncid,
- expr->opresulttype, -1,
- expr->opcollid,
- expr->inputcollid,
- &args,
- false,
- true,
- true,
- context);
if (simple) /* successfully simplified it */
return (Node *) simple;
Then we can get a more efficient query plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=120917.71..120917.76 rows=20 width=202) (actual time=336.255..344.250 rows=20 loops=1)
-> Sort (cost=120917.71..120936.18 rows=7387 width=202) (actual time=336.254..344.248 rows=20 loops=1)
Sort Key: (sum((lineitem.l_extendedprice * ((1)::numeric - lineitem.l_discount)))) DESC
Sort Method: top-N heapsort Memory: 34kB
-> Finalize GroupAggregate (cost=119764.35..120721.15 rows=7387 width=202) (actual time=271.425..335.755 rows=37925 loops=1)
Group Key: customer.c_custkey, nation.n_name
-> Gather Merge (cost=119764.35..120567.25 rows=6156 width=202) (actual time=271.420..309.049 rows=37925 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=118764.33..118856.67 rows=3078 width=202) (actual time=267.897..296.298 rows=12642 loops=3)
Group Key: customer.c_custkey, nation.n_name
-> Sort (cost=118764.33..118772.02 rows=3078 width=182) (actual time=267.881..271.626 rows=38183 loops=3)
Sort Key: customer.c_custkey, nation.n_name
Sort Method: external merge Disk: 7248kB
Worker 0: Sort Method: external merge Disk: 7328kB
Worker 1: Sort Method: external merge Disk: 7328kB
-> Hash Join (cost=113641.60..118585.99 rows=3078 width=182) (actual time=222.247..249.682 rows=38183 loops=3)
Hash Cond: (customer.c_nationkey = nation.n_nationkey)
-> Parallel Hash Join (cost=113640.04..118574.98 rows=3078 width=160) (actual time=222.183..243.657 rows=38183 loops=3)
Hash Cond: (customer.c_custkey = orders.o_custkey)
-> Parallel Seq Scan on customer (cost=0.00..4219.00 rows=62500 width=148) (actual time=0.029..5.646 rows=50000 loops=3)
-> Parallel Hash (cost=113601.56..113601.56 rows=3078 width=16) (actual time=221.947..221.948 rows=38183 loops=3)
Buckets: 131072 (originally 8192) Batches: 1 (originally 1) Memory Usage: 8064kB
-> Nested Loop (cost=0.43..113601.56 rows=3078 width=16) (actual time=0.096..204.897 rows=38183 loops=3)
-> Parallel Seq Scan on orders (cost=0.00..37062.50 rows=3125 width=8) (actual time=0.029..63.908 rows=19046 loops=3)
Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < ('1993-08-01'::date + '3 mons'::interval month)))
Rows Removed by Filter: 480954
-> Index Scan using lineitem_pkey on lineitem (cost=0.43..24.45 rows=4 width=16) (actual time=0.006..0.007 rows=2 loops=57138)
Index Cond: (l_orderkey = orders.o_orderkey)
Filter: (l_returnflag = 'R'::bpchar)
Rows Removed by Filter: 2
-> Hash (cost=1.25..1.25 rows=25 width=30) (actual time=0.034..0.035 rows=25 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=30) (actual time=0.018..0.023 rows=25 loops=3)
Planning Time: 1.207 ms
Execution Time: 345.427 ms
(36 rows)
The estimated cost is reduced by 39.69%, and the execution time is reduced by 32.54%. I measured the execution time on average of 10 executions.
I am not proposing a fixing patch, as the patch is incorrect. Instead, I just want to show disabling the simplify_function() function brings performance benefit, and it seems unexpected. I am wondering whether we can optimize simplify_function() to make the
performance better for this workload?
Environment:
I used 1 GB data of TPC-H benchmark, and my entire data folder can be downloaded here:
https://drive.google.com/file/d/1ZBLHanIRwxbaMQIhRUSPv4I7y8g_0AWi/view?usp=sharing
The connection string is: postgresql://ubuntu:ubuntu@127.0.0.1:5432/tpch"
tpch=# select version();
version
--------------------------------------------------------------------------------------------------
PostgreSQL 17.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit
(1 row)
Best regards, Jinsheng Ba
|