Performance issues related to left join and order by

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,


We have 2 TPC-H queries which fetch the same tuples but have significant query execution time differences (22.0 times).


We are sharing a pair of TPC-H queries that exhibit this performance difference:


First query:

SELECT    "orders3"."o_comment", 

          "orders3"."o_orderstatus", 

          "orders3"."o_orderkey", 

          "t17"."ps_partkey", 

          "t17"."ps_supplycost", 

          "t17"."ps_comment", 

          "orders3"."o_clerk", 

          "orders3"."o_totalprice", 

          "t17"."ps_availqty", 

          "t17"."ps_suppkey" 

FROM      ( 

                 SELECT * 

                 FROM   "partsupp" 

                 WHERE  "ps_comment" LIKE ', even theodolites. regular, final theodolites eat after the carefully pending foxes. furiously regular deposits sleep slyly. carefully bold realms above the ironic dependencies haggle careful') AS "t17"

LEFT JOIN "orders"                                                                                                                                                                                                        AS "orders3"

ON        true 

ORDER BY  "t17"."ps_supplycost"FETCH next 14 rows only 


Second query:

SELECT    "orders3"."o_comment", 

          "orders3"."o_orderstatus", 

          "orders3"."o_orderkey", 

          "t17"."ps_partkey", 

          "t17"."ps_supplycost", 

          "t17"."ps_comment", 

          "orders3"."o_clerk", 

          "orders3"."o_totalprice", 

          "t17"."ps_availqty", 

          "t17"."ps_suppkey" 

FROM      ( 

                   SELECT   * 

                   FROM     "partsupp" 

                   WHERE    "ps_comment" LIKE ', even theodolites. regular, final theodolites eat after the carefully pending foxes. furiously regular deposits sleep slyly. carefully bold realms above the ironic dependencies haggle careful'

                   ORDER BY "ps_supplycost"FETCH next 14 rows only) AS "t17" 

LEFT JOIN "orders"                          AS "orders3" 

ON        true 

ORDER BY  "t17"."ps_supplycost"FETCH next 14 rows only



* Actual Behavior

We executed both queries on the TPC-H benchmark of scale factor 5: the first query takes over 8 seconds, while the second query only takes 0.3 seconds.
We think the time difference results from different plans selected. Specifically, in the first (slow) query, the DBMS performs a left join using entire table partsupp, while the second (fast) query performs a left join using only 14 rows from partsupp).


* Query Execution Plan

  • First query:

                                                       QUERY PLAN                                                                                                                             

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=464628.69..464630.32 rows=14 width=223) (actual time=8082.764..8082.767 rows=14 loops=1)

   ->  Gather Merge  (cost=464628.69..1193917.91 rows=6250614 width=223) (actual time=8082.762..8087.639 rows=14 loops=1)

         Workers Planned: 2

         Workers Launched: 2

         ->  Sort  (cost=463628.66..471441.93 rows=3125307 width=223) (actual time=2933.506..2933.506 rows=5 loops=3)

               Sort Key: partsupp.ps_supplycost

               Sort Method: quicksort  Memory: 25kB

               Worker 0:  Sort Method: top-N heapsort  Memory: 32kB

               Worker 1:  Sort Method: quicksort  Memory: 25kB

               ->  Nested Loop Left Join  (cost=0.00..388506.36 rows=3125307 width=223) (actual time=360.602..1643.471 rows=2500000 loops=3)

                     ->  Parallel Seq Scan on partsupp  (cost=0.00..108031.62 rows=1 width=144) (actual time=360.577..360.599 rows=0 loops=3)

                           Filter: ((ps_comment)::text ~~ ', even theodolites. regular, final theodolites eat after the carefully pending foxes. furiously regular deposits sleep slyly. carefully bold realms above the ironic dependencies haggle careful'::text)

                           Rows Removed by Filter: 1333333

                     ->  Seq Scan on orders orders3  (cost=0.00..205467.37 rows=7500737 width=79) (actual time=0.064..1544.990 rows=7500000 loops=1)

 Planning Time: 0.278 ms

 Execution Time: 8087.714 ms

(16 rows)


  • Second query:

                                                                                                                                        QUERY PLAN                                                                                                                           

     

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-----

 Limit  (cost=109031.74..109032.26 rows=14 width=223) (actual time=363.883..363.890 rows=14 loops=1)

   ->  Nested Loop Left Join  (cost=109031.74..389506.49 rows=7500737 width=223) (actual time=363.882..363.887 rows=14 loops=1)

         ->  Limit  (cost=109031.74..109031.74 rows=1 width=144) (actual time=363.859..363.859 rows=1 loops=1)

               ->  Sort  (cost=109031.74..109031.74 rows=1 width=144) (actual time=363.858..363.858 rows=1 loops=1)

                     Sort Key: partsupp.ps_supplycost

                     Sort Method: quicksort  Memory: 25kB

                     ->  Gather  (cost=1000.00..109031.73 rows=1 width=144) (actual time=363.447..370.107 rows=1 loops=1)

                           Workers Planned: 2

                           Workers Launched: 2

                           ->  Parallel Seq Scan on partsupp  (cost=0.00..108031.62 rows=1 width=144) (actual time=360.033..360.101 rows=0 loops=3)

                                 Filter: ((ps_comment)::text ~~ ', even theodolites. regular, final theodolites eat after the carefully pending foxes. furiously regular deposits sleep slyly. carefully bold realms above the ironic dependencies haggle careful'::t

ext)

                                 Rows Removed by Filter: 1333333

         ->  Seq Scan on orders orders3  (cost=0.00..205467.37 rows=7500737 width=79) (actual time=0.016..0.017 rows=14 loops=1)

 Planning Time: 0.228 ms

 Execution Time: 370.200 ms

(15 rows)








*Expected Behavior

Since these two queries are semantically equivalent, we were hoping that PostgreSQL would evaluate them in roughly the same amount of time.
It looks to me that there is a missing optimization rule related to pushing the sort operator (i.e., order and limit) through the left join.
Given the significant query execution time difference, I was wondering if it is worth adding such a rule to make the system evaluate the first query more efficiently.
It would also be helpful if you could comment on if there is a standard practice to evaluate the tradeoff associated with adding such a rule in Postgresql.


*Test Environment

Ubuntu 20.04 machine "Linux panda 5.4.0-40-generic #44-Ubuntu SMP Tue Jun 23 00:01:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"

PostgreSQL v12.3

Database: TPC-H benchmark (with scale factor 5)

The description of table partsupp and orders is as follows:

tpch5=# \d partsupp;

                         Table "public.partsupp"

    Column     |          Type          | Collation | Nullable | Default 

---------------+------------------------+-----------+----------+---------

 ps_partkey    | integer                |           | not null | 

 ps_suppkey    | integer                |           | not null | 

 ps_availqty   | integer                |           | not null | 

 ps_supplycost | numeric(15,2)          |           | not null | 

 ps_comment    | character varying(199) |           | not null | 

Indexes:

    "partsupp_pkey" PRIMARY KEY, btree (ps_partkey, ps_suppkey)

Foreign-key constraints:

    "partsupp_fk1" FOREIGN KEY (ps_suppkey) REFERENCES supplier(s_suppkey)

    "partsupp_fk2" FOREIGN KEY (ps_partkey) REFERENCES part(p_partkey)

Referenced by:

    TABLE "lineitem" CONSTRAINT "lineitem_fk2" FOREIGN KEY (l_partkey, l_suppkey) REFERENCES partsupp(ps_partkey, ps_suppkey)

tpch5=# \d orders;

                          Table "public.orders"

     Column      |         Type          | Collation | Nullable | Default 

-----------------+-----------------------+-----------+----------+---------

 o_orderkey      | integer               |           | not null | 

 o_custkey       | integer               |           | not null | 

 o_orderstatus   | character(1)          |           | not null | 

 o_totalprice    | numeric(15,2)         |           | not null | 

 o_orderdate     | date                  |           | not null | 

 o_orderpriority | character(15)         |           | not null | 

 o_clerk         | character(15)         |           | not null | 

 o_shippriority  | integer               |           | not null | 

 o_comment       | character varying(79) |           | not null | 

Indexes:

    "orders_pkey" PRIMARY KEY, btree (o_orderkey)

Foreign-key constraints:

    "orders_fk1" FOREIGN KEY (o_custkey) REFERENCES customer(c_custkey)

Referenced by:

    TABLE "lineitem" CONSTRAINT "lineitem_fk1" FOREIGN KEY (l_orderkey) REFERENCES orders(o_orderkey)





*Here are the steps for reproducing our observations:

  1. Download the dataset from the link: https://drive.google.com/file/d/13rFa1BNDi4e2RmXBn-yEQkcqt6lsBu1c/view?usp=sharing

  2. Set up TPC-H benchmark

tar xzvf tpch5_postgresql.tar.gz

cd tpch5_postgresql

db=tpch5

createdb $db

psql -d $db < dss.ddl

for i in `ls *.tbl`

do

    echo $i

    name=`echo $i|cut -d'.' -f1`

    psql -d $db -c "COPY $name FROM '`pwd`/$i' DELIMITER '|' ENCODING 'LATIN1';"

done

psql -d $db < dss_postgres.ri

  1. Execute the queries


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux