On Thu, Sep 14, 2017 at 8:17 AM, Pavel Stehule <pavel.stehule@xxxxxxxxx> wrote: > 2017-09-14 15:09 GMT+02:00 Pavel Stehule <pavel.stehule@xxxxxxxxx>: >> >> >> >> 2017-09-14 14:59 GMT+02:00 Frank Millman <frank@xxxxxxxxxxxx>: >>> >>> Pavel Stehule wrote: >>> >>> 2017-09-14 10:14 GMT+02:00 Frank Millman <frank@xxxxxxxxxxxx>: >>>> >>>> Hi all >>>> >>>> This is a follow-up to a recent question I posted regarding a slow >>>> query. I thought that the slowness was caused by the number of JOINs in the >>>> query, but with your assistance I have found the true reason. I said in the >>>> previous thread that the question had become academic, but now that I >>>> understand things better, it is no longer academic as it casts doubt on my >>>> whole approach. >>>> >>>> I have split my AR transaction table into three physical tables – >>>> ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some >>>> point, such as ar_tran_jnl. >>>> >>>> I then create a VIEW to view all transactions combined. The view is >>>> created like this - >>>> >>>> CREATE VIEW ar_trans AS >>>> SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... >>>> FROM ar_tran_inv WHERE posted = ‘1’ >>>> UNION ALL >>>> SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... >>>> FROM ar_tran_crn WHERE posted = ‘1’ >>>> UNION ALL >>>> SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... >>>> FROM ar_tran_rec WHERE posted = ‘1’ >>>> >>>> I have another table called ‘ar_trans_due’, to keep track of outstanding >>>> transactions. All of the three transaction types generate entries into this >>>> table. To identify the source of the transaction, I have created columns in >>>> ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row >>>> into ‘ar_tran_inv’, I invoke this - >>>> >>>> INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES >>>> (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction >>>> types. It is handled by a Python program, and it all happens within a >>>> transaction. >>>> >>>> When I view a row in ar_trans_due, I want to retrieve data from the >>>> source transaction, so I have this - >>>> >>>> SELECT * FROM ar_trans_due a >>>> LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = >>>> a.tran_row_id >>>> >>>> I understand that PostgreSQL must somehow follow a path from the view >>>> ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would >>>> execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id = >>>> a.tran_row_id AND posted = ‘1’. >>>> >>>> If this was the case, it would be an indexed read, and very fast. >>>> Instead, according to EXPLAIN, it performs a sequential scan of the >>>> ‘ar_tran_inv’ table. >>>> >>>> It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it >>>> uses a Bitmap Heap Scan on those. I assume that is because the tables are >>>> currently empty. >>>> >>>> Is this analysis correct? >>> >>> >>> please, send EXPLAIN ANALYZE result :) >>>> >>>> >>> >>> >>> I tried to reduce this to its simplest form. >>> >>> Here is a SQL statement - >>> >>> SELECT * >>> FROM ccc.ar_trans_due a >>> LEFT JOIN ccc.ar_trans b ON >>> b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id >>> WHERE a.row_id = 1 >>> >>> ar_trans_due is a physical table, ar_trans is a view. >>> >>> It takes about 28ms. Here is the explain - >>> https://explain.depesz.com/s/8YY >>> >>> > > > The PostgreSQL cannot to push join - in slow case, the UNIONS should be done > first - and it requires full scan ar_tran_inv - used filter (posted AND > (deleted_id = 0) is not too effective - maybe some composite or partial > index helps. In my testing JOINS can push through UNION ALL. Why do we need to materialize union first? What version is this? merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general