Hello, We have 2 TPC-H queries which fetch the same tuples but have significant query execution time differences (4.3 times). We are sharing a pair of TPC-H queries that exhibit this performance difference: First query: SELECT "ps_comment", "ps_suppkey", "ps_supplycost", "ps_partkey", "ps_availqty" FROM "partsupp" WHERE "ps_partkey" + 16 < 1 OR "ps_partkey" = 2 GROUP BY "ps_partkey", "ps_suppkey", "ps_availqty", "ps_supplycost", "ps_comment" Second query: SELECT "ps_comment", "ps_suppkey", "ps_supplycost", "ps_partkey", "ps_availqty" FROM "partsupp" WHERE "ps_partkey" + 16 < 1 OR "ps_partkey" = 2 GROUP BY "ps_comment", "ps_suppkey", "ps_supplycost", "ps_partkey", "ps_availqty" * Actual Behavior We executed both queries on the TPC-H benchmark of scale factor 5: the first query takes
over 1.7 seconds, while the second query only takes 0.4 seconds. * Query Execution Plan
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Group (cost=0.43..342188.58 rows=399262 width=144) (actual time=0.058..1737.659 rows=4 loops=1) Group Key: ps_partkey, ps_suppkey Buffers: shared hit=123005 read=98055 -> Index Scan using partsupp_pkey on partsupp (cost=0.43..335522.75 rows=1333167 width=144) (actual time=0.055..1737.651 rows=4 loops=1) Filter: (((ps_partkey + 16) < 1) OR (ps_partkey = 2)) Rows Removed by Filter: 3999996 Buffers: shared hit=123005 read=98055 Planning Time: 0.926 ms Execution Time: 1737.754 ms (9 rows)
QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Group (cost=250110.68..350438.93 rows=399262 width=144) (actual time=400.353..400.361 rows=4 loops=1) Group Key: ps_suppkey, ps_partkey Buffers: shared hit=5481 read=24093 -> Gather Merge (cost=250110.68..346446.31 rows=798524 width=144) (actual time=400.351..406.741 rows=4 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=15151 read=72144 -> Group (cost=249110.66..253276.80 rows=399262 width=144) (actual time=395.882..395.883 rows=1 loops=3) Group Key: ps_suppkey, ps_partkey Buffers: shared hit=15151 read=72144 -> Sort (cost=249110.66..250499.37 rows=555486 width=144) (actual time=395.880..395.881 rows=1 loops=3) Sort Key: ps_suppkey, ps_partkey Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB Buffers: shared hit=15151 read=72144 -> Parallel Seq Scan on partsupp (cost=0.00..116363.88 rows=555486 width=144) (actual time=395.518..395.615 rows=1 loops=3) Filter: (((ps_partkey + 16) < 1) OR (ps_partkey = 2)) Rows Removed by Filter: 1333332 Buffers: shared hit=15065 read=72136 Planning Time: 0.360 ms Execution Time: 406.880 ms (22 rows) *Expected Behavior Since these two queries are semantically equivalent, we were hoping that PostgreSQL
would evaluate them in roughly the same amount of time. *Test Environment Ubuntu 20.04 machine "Linux panda 5.4.0-40-generic #44-Ubuntu SMP Tue Jun 23 00:01:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux" PostgreSQL v12.3 Database: TPC-H benchmark (with scale factor 5) The description of table partsupp is as follows: tpch5=# \d partsupp; Table "public.partsupp" Column | Type | Collation | Nullable | Default ---------------+------------------------+-----------+----------+--------- ps_partkey | integer | | not null | ps_suppkey | integer | | not null | ps_availqty | integer | | not null | ps_supplycost | numeric(15,2) | | not null | ps_comment | character varying(199) | | not null | Indexes: "partsupp_pkey" PRIMARY KEY, btree (ps_partkey, ps_suppkey) Foreign-key constraints: "partsupp_fk1" FOREIGN KEY (ps_suppkey) REFERENCES supplier(s_suppkey) "partsupp_fk2" FOREIGN KEY (ps_partkey) REFERENCES part(p_partkey) Referenced by: TABLE "lineitem" CONSTRAINT "lineitem_fk2" FOREIGN KEY (l_partkey, l_suppkey) REFERENCES partsupp(ps_partkey, ps_suppkey) *Here are the steps for reproducing our observations:
tar xzvf tpch5_postgresql.tar.gz cd tpch5_postgresql db=tpch5 createdb $db psql -d $db < dss.ddl for i in `ls *.tbl` do echo $i name=`echo $i|cut -d'.' -f1` psql -d $db -c "COPY $name FROM '`pwd`/$i' DELIMITER '|' ENCODING 'LATIN1';" done psql -d $db < dss_postgres.ri
|