Re: a JOIN to a VIEW seems slow

"Frank Millman" <frank@xxxxxxxxxxxx> · Thu, 21 Sep 2017 08:51:37 +0200

On 2017-09-18 Frank Millman wrote:

> 

> Here are the timings for running the query on 
identical data sets using Postgresql, Sql Server, and Sqlite3 -
> 
> PostgreSQL -
>     Method 1 - 0.28 sec
>     Method 2 – 1607 sec, or 26 minutes
> 
> Sql Server -
>     Method 1 – 0.33 sec
>     Method 2 – 1.8 sec
> 
> Sqlite3 -
>     Method 1 – 0.15 sec
>     Method 2 – 1.0 sec
> 
> It seems that Sql Server and Sqlite3 are able to analyse the ‘join’, 
and execute an indexed read against the underlying physical tables.
> 

I did not get any response to this, but I am still persevering, and feel 
that I am getting closer. Instead of waiting 26 minutes for a result, I realise 
that I can learn a lot by using EXPLAIN. This is what I have found out.

To recap, I have the following tables -

1. ‘ar_tran_inv’, to store invoices
2. ‘ar_tran_crn’, to store credit notes
3. ‘ar_tran_rec’ to store receipts

This is a subset of their common columns -
    row_id INT SERIAL PRIMARY KEY,
    tran_number VARCHAR,
    posted BOOL,

I have created a VIEW called ‘ar_trans’ to combine them -

CREATE VIEW ar_trans AS
  SELECT ‘ar_inv’ AS tran_type, 
row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = 
‘1’
  UNION ALL
  SELECT ‘ar_crn’ AS tran_type, 
row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = 
‘1’

  UNION ALL
  SELECT ‘ar_rec’ AS tran_type, 
row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = 
‘1’

I have another table called ‘ar_trans_due’, where a row is inserted 
whenever a row is inserted into any of the three transaction tables. To identify 
the source transaction, I have the following columns -
    tran_type VARCHAR – can be ‘ar_inv’, ‘ar_crn’, or 
‘ar_rec’
    tran_row_id INT – the primary key of the originating 
transaction

Now here are my tests -

1. =======================================================

SELECT tran_type, tran_row_id FROM ar_trans_due WHERE row_id = 1;

tran_type | tran_row_id 
-----------+-------------
ar_inv    
|           1
(1 row)

Just to give me some sample data to work with.

2. =======================================================

EXPLAIN SELECT * FROM ar_trans WHERE tran_type = ‘ar_inv’ AND tran_row_id = 
1;

QUERY 
PLAN                                         

-------------------------------------------------------------------------------------------
Append  (cost=0.29..8.32 rows=1 width=117)
   ->  Index Scan using ar_tran_inv_pkey on 
ar_tran_inv  (cost=0.29..8.31 rows=1 width=46)
         Index Cond: (row_id = 
1)
         Filter: posted
(4 rows)

This is a select against the view. It has worked out that the underlying 
table to use is ‘ar_tran_inv’, and performed an indexed read.

3. =======================================================

EXPLAIN SELECT * FROM ar_trans_due a LEFT JOIN ar_tran_inv b ON b.row_id = 
a.tran_row_id where a.row_id = 1;

QUERY 
PLAN                                           

-----------------------------------------------------------------------------------------------
Nested Loop Left Join  (cost=0.58..16.62 rows=1 width=142)
   ->  Index Scan using ar_trans_due_pkey on ar_trans_due 
a  (cost=0.29..8.31 rows=1 width=52)
         Index Cond: (row_id = 
1)
   ->  Index Scan using ar_tran_inv_pkey on ar_tran_inv 
b  (cost=0.29..8.30 rows=1 width=90)
         Index Cond: (row_id = 
a.tran_row_id)
(5 rows)

Here I have selected a row from ar_trans_due, and joined 
the underlying physical table directly. It uses an indexed read to perform the 
join.

4. =======================================================

EXPLAIN SELECT * FROM ar_trans_due a LEFT JOIN ar_trans b ON b.tran_type = 
a.tran_type AND b.tran_row_id = a.tran_row_id where a.row_id = 1;

QUERY 
PLAN                                              

-----------------------------------------------------------------------------------------------------
Hash Right Join  (cost=8.32..2072.99 rows=1 width=169)
   Hash Cond: ((('ar_inv'::text) = (a.tran_type)::text) AND 
(ar_tran_inv.row_id = a.tran_row_id))
   ->  Append  (cost=0.00..1739.47 rows=43357 
width=117)
         ->  Seq Scan on 
ar_tran_inv  (cost=0.00..676.01 rows=21601 width=46)

Filter: posted
         ->  Seq Scan on 
ar_tran_crn  (cost=0.00..13.88 rows=155 width=124)

Filter: posted
         ->  Seq Scan on 
ar_tran_rec  (cost=0.00..616.01 rows=21601 width=40)

Filter: posted
   ->  Hash  (cost=8.31..8.31 rows=1 width=52)
         ->  Index Scan 
using ar_trans_due_pkey on ar_trans_due a  (cost=0.29..8.31 rows=1 
width=52)

Index Cond: (row_id = 1)
(12 rows)

Here I have selected the same row, and joined the view ‘ar_trans’. It seems 
to have all the information necessary to perform an indexed read, but instead it 
performs a sequential scan of all three of the underlying tables. I don’t know 
why it shows 155 rows for ar_tran_crn – the table is actually empty.

I am using version 9.4.4 on Fedora 22. I don’t want to upgrade just for the 
sake of it, but if any work has been done in this area for 9.5 or 9.6, that 
would be the obvious first thing to try.

Any suggestions welcomed.

Frank Millman