Re: Slow SELECT

Frank Millman <frank@xxxxxxxxxxxx> · Tue, 26 May 2020 11:32:47 +0200

On 2020-05-26 11:10 AM, Charles Clavadetscher wrote:
Hello

On 2020-05-26 10:38, Frank Millman wrote:
On 2020-05-26 9:32 AM, Olivier Gautherot wrote:
Hi Frank,

On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank@xxxxxxxxxxxx 
<mailto:frank@xxxxxxxxxxxx>> wrote:

    Hi all

    I have a SELECT that runs over 5 times slower on PostgreSQL compared
    with Sql Server and sqlite3. I am trying to understand why.

    I have a table that looks like this (simplified) -

    CREATE TABLE my_table (
          row_id SERIAL PRIMARY KEY,
          deleted_id INT DEFAULT 0,
          fld_1 INT REFERENCES table_1(row_id),
          fld_2 INT REFERENCES table_2(row_id),
          fld_3 INT REFERENCES table_3(row_id),
          fld_4 INT REFERENCES table_4(row_id),
          tran_date DATE,
          tran_total DEC(21,2)
          );

    CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3,
    fld_4, tran_date) WHERE deleted_id = 0;

    The table sizes are -
          my_table : 167 rows
          table_1 : 21 rows
          table_2 : 11 rows
          table_3 : 3 rows
          table_4 : 16 rows

    Therefore for each tran_date in my_table there are potentially
    21x11x3x16 = 11088 rows. Most will be null.

    I want to select the row_id for the last tran_date for each of those
    potential groups. This is my select -

          SELECT (
              SELECT a.row_id FROM my_table a
              WHERE a.fld_1 = b.row_id
              AND a.fld_2 = c.row_id
              AND a.fld_3 = d.row_id
              AND a.fld_4 = e.row_id
              AND a.deleted_id = 0
              ORDER BY a.tran_date DESC LIMIT 1
          )
          FROM table_1 b, table_2 c, table_3 d, table_4 e

    Out of 11088 rows selected, 103 are not null.

    On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on
    sqlite3, and 0.31 sec on PostgreSQL.

SQL Server does a good job at caching data in memory. PostgreSQL does 
too on consecutive calls to the same table. What execution time do 
you get if you issue the query a second time?

My first guess would be to add an index on my_table.tran_date and 
check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table.

    I have looked at the EXPLAIN, but I don't really know what to 
look for.
    I can supply it if that would help.

    Thanks for any advice.

Thanks Olivier. Unfortunately that did not help.

I was already running the query twice and only timing the second one.

I added the index on tran_date. The timing is the same, and EXPLAIN
shows that it is using a SEQUENTIAL SCAN.

Here is the EXPLAIN -

 Nested Loop  (cost=0.00..64155.70 rows=11088 width=4)
   ->  Nested Loop  (cost=0.00..10.36 rows=528 width=12)
         ->  Nested Loop  (cost=0.00..2.56 rows=33 width=8)
               ->  Seq Scan on table_2 c  (cost=0.00..1.11 rows=11 
width=4)
               ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
                     ->  Seq Scan on table_3 d  (cost=0.00..1.03 
rows=3 width=4)
         ->  Materialize  (cost=0.00..1.24 rows=16 width=4)
               ->  Seq Scan on table_4 e  (cost=0.00..1.16 rows=16 
width=4)
   ->  Materialize  (cost=0.00..1.31 rows=21 width=4)
         ->  Seq Scan on table_1 b  (cost=0.00..1.21 rows=21 width=4)
   SubPlan 1
     ->  Limit  (cost=5.77..5.77 rows=1 width=8)
           ->  Sort  (cost=5.77..5.77 rows=1 width=8)
                 Sort Key: a.tran_date DESC
                 ->  Seq Scan on my_table a  (cost=0.00..5.76 rows=1 
width=8)
                       Filter: ((fld_1 = b.row_id) AND (fld_2 =
c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND
(deleted_id = 0))

Frank

If I see it correct, the query runs sequential scans on all tables, i.e. 
table_1 to table_4.
Do you have an index on the referenced keys (row_id) in table_1 to table_4?

It happens often that referenced keys are not indexed, leading to poor 
execution plans.

The referenced keys are all defined as SERIAL PRIMARY KEY in their own 
tables, so I presume that that are all indexed automatically.

On the other hand, there are not many rows in those tables, so the 
planner may decide not to use the index in that case.

Frank