On 2017-09-18 Frank Millman wrote:
>
> Here are the timings for running the query on
identical data sets using Postgresql, Sql Server, and Sqlite3 - >
> PostgreSQL -
> Method 1 - 0.28 sec
> Method 2 – 1607 sec, or 26 minutes
>
> Sql Server -
> Method 1 – 0.33 sec
> Method 2 – 1.8 sec
>
> Sqlite3 -
> Method 1 – 0.15 sec
> Method 2 – 1.0 sec
>
> It seems that Sql Server and Sqlite3 are able to analyse the ‘join’,
and execute an indexed read against the underlying physical tables.
>
I did not get any response to this, but I am still persevering, and feel
that I am getting closer. Instead of waiting 26 minutes for a result, I realise
that I can learn a lot by using EXPLAIN. This is what I have found out.
To recap, I have the following tables -
1. ‘ar_tran_inv’, to store invoices
2. ‘ar_tran_crn’, to store credit notes
3. ‘ar_tran_rec’ to store receipts
This is a subset of their common columns -
row_id INT SERIAL PRIMARY KEY,
tran_number VARCHAR,
posted BOOL,
I have created a VIEW called ‘ar_trans’ to combine them -
CREATE VIEW ar_trans AS
SELECT ‘ar_inv’ AS tran_type,
row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted =
‘1’
UNION ALL
SELECT ‘ar_crn’ AS tran_type,
row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted =
‘1’
UNION ALL
SELECT ‘ar_rec’ AS tran_type,
row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted =
‘1’
I have another table called ‘ar_trans_due’, where a row is inserted
whenever a row is inserted into any of the three transaction tables. To identify
the source transaction, I have the following columns -
tran_type VARCHAR – can be ‘ar_inv’, ‘ar_crn’, or
‘ar_rec’
tran_row_id INT – the primary key of the originating
transaction
Now here are my tests -
1. =======================================================
SELECT tran_type, tran_row_id FROM ar_trans_due WHERE row_id = 1;
tran_type | tran_row_id
-----------+-------------
ar_inv
| 1
(1 row)
Just to give me some sample data to work with.
2. =======================================================
EXPLAIN SELECT * FROM ar_trans WHERE tran_type = ‘ar_inv’ AND tran_row_id =
1;
QUERY
PLAN
-------------------------------------------------------------------------------------------
Append (cost=0.29..8.32 rows=1 width=117)
-> Index Scan using ar_tran_inv_pkey on
ar_tran_inv (cost=0.29..8.31 rows=1 width=46)
Index Cond: (row_id =
1)
Filter: posted
(4 rows)
This is a select against the view. It has worked out that the underlying
table to use is ‘ar_tran_inv’, and performed an indexed read.
3. =======================================================
EXPLAIN SELECT * FROM ar_trans_due a LEFT JOIN ar_tran_inv b ON b.row_id =
a.tran_row_id where a.row_id = 1;
QUERY
PLAN
-----------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.58..16.62 rows=1 width=142)
-> Index Scan using ar_trans_due_pkey on ar_trans_due
a (cost=0.29..8.31 rows=1 width=52)
Index Cond: (row_id =
1)
-> Index Scan using ar_tran_inv_pkey on ar_tran_inv
b (cost=0.29..8.30 rows=1 width=90)
Index Cond: (row_id =
a.tran_row_id)
(5 rows) Here I have selected a row from ar_trans_due, and joined
the underlying physical table directly. It uses an indexed read to perform the
join.
4. =======================================================
EXPLAIN SELECT * FROM ar_trans_due a LEFT JOIN ar_trans b ON b.tran_type =
a.tran_type AND b.tran_row_id = a.tran_row_id where a.row_id = 1;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------
Hash Right Join (cost=8.32..2072.99 rows=1 width=169)
Hash Cond: ((('ar_inv'::text) = (a.tran_type)::text) AND
(ar_tran_inv.row_id = a.tran_row_id))
-> Append (cost=0.00..1739.47 rows=43357
width=117)
-> Seq Scan on
ar_tran_inv (cost=0.00..676.01 rows=21601 width=46)
Filter: posted
-> Seq Scan on
ar_tran_crn (cost=0.00..13.88 rows=155 width=124)
Filter: posted
-> Seq Scan on
ar_tran_rec (cost=0.00..616.01 rows=21601 width=40)
Filter: posted
-> Hash (cost=8.31..8.31 rows=1 width=52)
-> Index Scan
using ar_trans_due_pkey on ar_trans_due a (cost=0.29..8.31 rows=1
width=52)
Index Cond: (row_id = 1)
(12 rows)
Here I have selected the same row, and joined the view ‘ar_trans’. It seems
to have all the information necessary to perform an indexed read, but instead it
performs a sequential scan of all three of the underlying tables. I don’t know
why it shows 155 rows for ar_tran_crn – the table is actually empty.
I am using version 9.4.4 on Fedora 22. I don’t want to upgrade just for the
sake of it, but if any work has been done in this area for 9.5 or 9.6, that
would be the obvious first thing to try.
Any suggestions welcomed.
Frank Millman
|