Re: a JOIN to a VIEW seems slow

Pavel Stehule <pavel.stehule@xxxxxxxxx> · Thu, 14 Sep 2017 15:17:55 +0200

2017-09-14 15:09 GMT+02:00 Pavel Stehule <pavel.stehule@xxxxxxxxx>:

2017-09-14 14:59 GMT+02:00 Frank Millman <frank@xxxxxxxxxxxx>:

Pavel 
Stehule wrote:

2017-09-14 10:14 GMT+02:00 Frank Millman <frank@xxxxxxxxxxxx>:

  Hi all

  This is a follow-up to a recent question I posted regarding a slow query. 
  I thought that the slowness was caused by the number of JOINs in the query, 
  but with your assistance I have found the true reason. I said in the previous 
  thread that the question had become academic, but now that I understand things 
  better, it is no longer academic as it casts doubt on my whole approach.

  I have split my AR transaction table into three physical tables – 
  ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some 
  point, such as ar_tran_jnl.

  I then create a VIEW to view all transactions combined. The view is created like this -

  CREATE VIEW ar_trans AS
    SELECT ‘ar_inv’ AS tran_type, row_id 
  AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = 
  ‘1’
    UNION ALL
    SELECT ‘ar_crn’ AS tran_type, row_id 
  AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = 
  ‘1’

    UNION ALL
    SELECT ‘ar_rec’ AS tran_type, row_id 
  AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = 
  ‘1’

  I have another table called ‘ar_trans_due’, 
  to keep track of outstanding transactions. All of the three transaction types 
  generate entries into this table. To identify the source of the transaction, I 
  have created columns in ar_trans_due called ‘tran_type’ and ‘tran_row_id’. 
  After inserting a row into ‘ar_tran_inv’, 
  I invoke this -

    INSERT INTO ar_trans_due (tran_type, 
  tran_row_id, ...) VALUES (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for 
  the other transaction types. It is handled by a Python program, and it all 
  happens within a transaction.

  When I view a row in ar_trans_due, I want to 
  retrieve data from the source transaction, so I have this -

    SELECT * FROM ar_trans_due 
  a
    LEFT JOIN ar_trans b ON b.tran_type = 
  a.tran_type AND b.tran_row_id = a.tran_row_id

  I understand that PostgreSQL must somehow 
  follow a path from the view ‘ar_trans’ to the physical table ‘ar_tran_inv’, 
  but I assumed it would execute the equivalent of SELECT * FROM ar_tran_inv 
  WHERE row_id = a.tran_row_id AND posted = ‘1’.

  If this was the case, it would be an indexed 
  read, and very fast. Instead, according to EXPLAIN, it performs a sequential 
  scan of the ‘ar_tran_inv’ table.

  It also scans ‘ar_tran_crn’ and 
  ‘ar_tran_rec’, but EXPLAIN shows that it uses a Bitmap Heap Scan on those. I 
  assume that is because the tables are currently empty.

  Is this analysis 
  correct?

please, send EXPLAIN ANALYZE result :) 

I tried to reduce this to its simplest form.

Here is a SQL statement -

    SELECT *
    FROM ccc.ar_trans_due a
    LEFT JOIN ccc.ar_trans b ON

        b.tran_type = a.tran_type AND 
b.tran_row_id = a.tran_row_id
    WHERE a.row_id = 1

ar_trans_due is a physical table, ar_trans is a view.

It takes about 28ms. Here is the explain - https://explain.depesz.com/s/8YY

The PostgreSQL cannot to push join - in slow case, the UNIONS should be done first - and it requires full scan ar_tran_inv - used filter (posted AND (deleted_id = 0) is not too effective - maybe some composite or partial index helps.

The fast query doesn't contains unions - so there are bigger space for optimizer - ar_tran_inv is filtered effective - by primary key. 

So main problem is impossible to push information a.row_id = 1 to deep to query.

Then I changed it to join each of the physical tables, instead of the view 
-

    SELECT *
    FROM ccc.ar_trans_due a
    LEFT JOIN ccc.ar_tran_inv b ON
        b.row_id = a.tran_row_id
    LEFT JOIN ccc.ar_tran_crn c ON
        c.row_id = a.tran_row_id
    LEFT JOIN ccc.ar_tran_rec d ON
        d.row_id = a.tran_row_id
    WHERE a.row_id = 1

This takes just over 1ms. Here is the explain - https://explain.depesz.com/s/U29h

I tried setting enable_seq_scan to off – it ran even slower!

Frank