Re: Slow SELECT

Frank Millman <frank@xxxxxxxxxxxx> · Tue, 26 May 2020 12:25:26 +0200

On 2020-05-26 11:27 AM, Charles Clavadetscher wrote:
On 2020-05-26 11:10, Charles Clavadetscher wrote:
Hello

On 2020-05-26 10:38, Frank Millman wrote:
On 2020-05-26 9:32 AM, Olivier Gautherot wrote:
Hi Frank,

On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank@xxxxxxxxxxxx 
<mailto:frank@xxxxxxxxxxxx>> wrote:
    Hi all

    I have a SELECT that runs over 5 times slower on PostgreSQL 
compared
    with Sql Server and sqlite3. I am trying to understand why.

    I have a table that looks like this (simplified) -

    CREATE TABLE my_table (
          row_id SERIAL PRIMARY KEY,
          deleted_id INT DEFAULT 0,
          fld_1 INT REFERENCES table_1(row_id),
          fld_2 INT REFERENCES table_2(row_id),
          fld_3 INT REFERENCES table_3(row_id),
          fld_4 INT REFERENCES table_4(row_id),
          tran_date DATE,
          tran_total DEC(21,2)
          );

    CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3,
    fld_4, tran_date) WHERE deleted_id = 0;

    The table sizes are -
          my_table : 167 rows
          table_1 : 21 rows
          table_2 : 11 rows
          table_3 : 3 rows
          table_4 : 16 rows

    Therefore for each tran_date in my_table there are potentially
    21x11x3x16 = 11088 rows. Most will be null.

    I want to select the row_id for the last tran_date for each of 
those
    potential groups. This is my select -

          SELECT (
              SELECT a.row_id FROM my_table a
              WHERE a.fld_1 = b.row_id
              AND a.fld_2 = c.row_id
              AND a.fld_3 = d.row_id
              AND a.fld_4 = e.row_id
              AND a.deleted_id = 0
              ORDER BY a.tran_date DESC LIMIT 1
          )
          FROM table_1 b, table_2 c, table_3 d, table_4 e

    Out of 11088 rows selected, 103 are not null.

    On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on
    sqlite3, and 0.31 sec on PostgreSQL.

SQL Server does a good job at caching data in memory. PostgreSQL 
does too on consecutive calls to the same table. What execution time 
do you get if you issue the query a second time?
My first guess would be to add an index on my_table.tran_date and 
check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table.
    I have looked at the EXPLAIN, but I don't really know what to 
look for.
    I can supply it if that would help.

    Thanks for any advice.

Thanks Olivier. Unfortunately that did not help.

I was already running the query twice and only timing the second one.

I added the index on tran_date. The timing is the same, and EXPLAIN
shows that it is using a SEQUENTIAL SCAN.

Here is the EXPLAIN -

 Nested Loop  (cost=0.00..64155.70 rows=11088 width=4)
   ->  Nested Loop  (cost=0.00..10.36 rows=528 width=12)
         ->  Nested Loop  (cost=0.00..2.56 rows=33 width=8)
               ->  Seq Scan on table_2 c  (cost=0.00..1.11 rows=11 
width=4)
               ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
                     ->  Seq Scan on table_3 d  (cost=0.00..1.03 
rows=3 width=4)
         ->  Materialize  (cost=0.00..1.24 rows=16 width=4)
               ->  Seq Scan on table_4 e  (cost=0.00..1.16 rows=16 
width=4)
   ->  Materialize  (cost=0.00..1.31 rows=21 width=4)
         ->  Seq Scan on table_1 b  (cost=0.00..1.21 rows=21 width=4)
   SubPlan 1
     ->  Limit  (cost=5.77..5.77 rows=1 width=8)
           ->  Sort  (cost=5.77..5.77 rows=1 width=8)
                 Sort Key: a.tran_date DESC
                 ->  Seq Scan on my_table a  (cost=0.00..5.76 rows=1 
width=8)
                       Filter: ((fld_1 = b.row_id) AND (fld_2 =
c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND
(deleted_id = 0))

Frank
If I see it correct, the query runs sequential scans on all tables,
i.e. table_1 to table_4.
Do you have an index on the referenced keys (row_id) in table_1 to 
table_4?
It happens often that referenced keys are not indexed, leading to poor
execution plans.

Bye
Charles
I noticed later that you have very small tables. This will probably lead 
to a sequential scan althought there is an index in place.
I am not sure if it makes a difference, but what about using explicit 
joins?
SELECT a.row_id FROM my_table a
JOIN b table_1 ON (b.row_id = a.fld_1)
JOIN c table_2 ON (c.row_id = a.fld_2)
JOIN d table_3 ON (d.row_id = a.fld_3)
JOIN e table_4 ON (e.row_id = a.fld_4)
WHERE a.deleted_id = 0
ORDER BY a.tran_date DESC LIMIT 1;

Thanks, Charles. I tried that, but unfortunately it produces a different 
result. I need to test for every possible combination of fld1-4, and get 
the highest date for each one. Using joins only tests existing 
combinations, and gets the highest date for all of them combined.
Seel my reply to David Rowley. I do not fully understand his solution 
yet, but it seems to be what I am looking for.
Thanks again

Frank