On 2020-05-26 9:32 AM, Olivier Gautherot wrote:
Hi Frank,
On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank@xxxxxxxxxxxx
<mailto:frank@xxxxxxxxxxxx>> wrote:
Hi all
I have a SELECT that runs over 5 times slower on PostgreSQL compared
with Sql Server and sqlite3. I am trying to understand why.
I have a table that looks like this (simplified) -
CREATE TABLE my_table (
row_id SERIAL PRIMARY KEY,
deleted_id INT DEFAULT 0,
fld_1 INT REFERENCES table_1(row_id),
fld_2 INT REFERENCES table_2(row_id),
fld_3 INT REFERENCES table_3(row_id),
fld_4 INT REFERENCES table_4(row_id),
tran_date DATE,
tran_total DEC(21,2)
);
CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3,
fld_4, tran_date) WHERE deleted_id = 0;
The table sizes are -
my_table : 167 rows
table_1 : 21 rows
table_2 : 11 rows
table_3 : 3 rows
table_4 : 16 rows
Therefore for each tran_date in my_table there are potentially
21x11x3x16 = 11088 rows. Most will be null.
I want to select the row_id for the last tran_date for each of those
potential groups. This is my select -
SELECT (
SELECT a.row_id FROM my_table a
WHERE a.fld_1 = b.row_id
AND a.fld_2 = c.row_id
AND a.fld_3 = d.row_id
AND a.fld_4 = e.row_id
AND a.deleted_id = 0
ORDER BY a.tran_date DESC LIMIT 1
)
FROM table_1 b, table_2 c, table_3 d, table_4 e
Out of 11088 rows selected, 103 are not null.
On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on
sqlite3, and 0.31 sec on PostgreSQL.
SQL Server does a good job at caching data in memory. PostgreSQL does
too on consecutive calls to the same table. What execution time do
you get if you issue the query a second time?
My first guess would be to add an index on my_table.tran_date and
check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table.
I have looked at the EXPLAIN, but I don't really know what to
look for.
I can supply it if that would help.
Thanks for any advice.
Thanks Olivier. Unfortunately that did not help.
I was already running the query twice and only timing the second one.
I added the index on tran_date. The timing is the same, and EXPLAIN
shows that it is using a SEQUENTIAL SCAN.
Here is the EXPLAIN -
Nested Loop (cost=0.00..64155.70 rows=11088 width=4)
-> Nested Loop (cost=0.00..10.36 rows=528 width=12)
-> Nested Loop (cost=0.00..2.56 rows=33 width=8)
-> Seq Scan on table_2 c (cost=0.00..1.11 rows=11
width=4)
-> Materialize (cost=0.00..1.04 rows=3 width=4)
-> Seq Scan on table_3 d (cost=0.00..1.03
rows=3 width=4)
-> Materialize (cost=0.00..1.24 rows=16 width=4)
-> Seq Scan on table_4 e (cost=0.00..1.16 rows=16
width=4)
-> Materialize (cost=0.00..1.31 rows=21 width=4)
-> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4)
SubPlan 1
-> Limit (cost=5.77..5.77 rows=1 width=8)
-> Sort (cost=5.77..5.77 rows=1 width=8)
Sort Key: a.tran_date DESC
-> Seq Scan on my_table a (cost=0.00..5.76 rows=1
width=8)
Filter: ((fld_1 = b.row_id) AND (fld_2 =
c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND
(deleted_id = 0))
Frank