Re: When to store data that could be derived

Frank <frank@xxxxxxxxxxxx> · Tue, 26 Mar 2019 16:37:38 +0200

On 2019-03-25 5:44 PM, Frank wrote:

On reflection, I have not been consistent with my use of indexes, and I 
think that will affect the query plan.

There are at least two issues -

1. The first table defined in the VIEW is ccc.ar_tran_inv. It has the 
following index -
     "arinv_cust_date" btree (cust_row_id NULLS FIRST, tran_date NULLS 
FIRST) WHERE deleted_id = 0

I have not used "WHERE deleted_id = 0" when constructing the VIEW, but I 
have used "WHERE posted = '1'". I don't think the index can be used with 
this setup.

2. The complicated table in the VIEW is ccc.ar_rec_subtran. Various 
columns such as tran_date and posted are retrieved via CASE expressions 
from two underlying tables. Those tables have certain indexes defined, 
but I cannot see how they can be utilised from my current setup.

I think I should spend some time tidying this up before you try to make 
sense of the query plan. Any tips on how to improve it will be appreciated.

I have spent a lot of time testing various permutations and trying to 
understand them using EXPLAIN.

My original concern was the overhead of calculating derived data. I now 
realise that it is more important to get the indexes right, as that has 
a much bigger impact on performance.

The VIEW that I have been using for testing is actually masking the 
problem. The view combines four tables, three of which are 
straightforward, easy to index, and fast. The fourth one is complex, 
difficult to index, and comparatively slow. So I forgot about the VIEW 
and concentrated on the complex table.

I now understand the caveats I received earlier in this thread. It seems 
impossible to make use of the indexes on the JOINed tables in the 
following query. I did create an index on cust_row_id in the main table, 
and it made a difference with a simple query, but by the time I added 
the JOINs, the improvement was hardly noticeable.

This is the query that I used -

EXPLAIN SELECT COUNT(*)
    FROM ccc.ar_rec_subtran a
    LEFT JOIN ccc.ar_tran_rec_det z ON z.row_id = a.tran_det_row_id
    LEFT JOIN ccc.ar_tran_rec y ON y.row_id = z.tran_row_id
    LEFT JOIN ccc.cb_tran_rec_det x ON x.row_id = a.tran_det_row_id
    LEFT JOIN ccc.cb_tran_rec w ON w.row_id = x.tran_row_id
WHERE a.cust_row_id = 4 AND
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.tran_date
        WHEN a.tran_type = 'cb_rec' THEN w.tran_date
    END BETWEEN '2015-05-01' AND '2015-05-31' AND
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.posted
        WHEN a.tran_type = 'cb_rec' THEN w.posted
    END = true AND
    a.deleted_id = 0;

Attached is the EXPLAIN for this one.

I also ran the same query with the following WHERE clause -

WHERE a.cust_row_id = 4 AND
    ((a.tran_type = 'ar_rec' AND
        y.tran_date BETWEEN '2015-05-01' AND '2015-05-31') OR
    (a.tran_type = 'cb_rec' AND
        w.tran_date BETWEEN '2015-05-01' AND '2015-05-31')) AND
    ((a.tran_type = 'ar_rec' AND y.posted = true) OR
        (a.tran_type = 'cb_rec' AND w.posted = true)) AND
    a.deleted_id = 0;

The timings were virtually identical, so I have not attached that EXPLAIN.

Frank
                                                                                                                                                                                                                                                           QUERY PLAN                                                                                                                                                                                                                                                            
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1514.75..1514.76 rows=1 width=8)
   ->  Hash Left Join  (cost=895.41..1514.49 rows=102 width=0)
         Hash Cond: (a.tran_det_row_id = x.row_id)
         Filter: (CASE WHEN ((a.tran_type)::text = 'ar_rec'::text) THEN y.posted WHEN ((a.tran_type)::text = 'cb_rec'::text) THEN w.posted ELSE NULL::boolean END AND (CASE WHEN ((a.tran_type)::text = 'ar_rec'::text) THEN y.tran_date WHEN ((a.tran_type)::text = 'cb_rec'::text) THEN w.tran_date ELSE NULL::date END >= '2015-05-01'::date) AND (CASE WHEN ((a.tran_type)::text = 'ar_rec'::text) THEN y.tran_date WHEN ((a.tran_type)::text = 'cb_rec'::text) THEN w.tran_date ELSE NULL::date END <= '2015-05-31'::date))
         ->  Hash Right Join  (cost=893.05..1505.23 rows=1835 width=16)
               Hash Cond: (y.row_id = z.tran_row_id)
               ->  Seq Scan on ar_tran_rec y  (cost=0.00..511.25 rows=22025 width=9)
               ->  Hash  (cost=870.12..870.12 rows=1835 width=15)
                     ->  Hash Right Join  (cost=343.97..870.12 rows=1835 width=15)
                           Hash Cond: (z.row_id = a.tran_det_row_id)
                           ->  Seq Scan on ar_tran_rec_det z  (cost=0.00..397.53 rows=22053 width=8)
                           ->  Hash  (cost=321.03..321.03 rows=1835 width=11)
                                 ->  Bitmap Heap Scan on ar_rec_subtran a  (cost=38.51..321.03 rows=1835 width=11)
                                       Recheck Cond: (cust_row_id = 4)
                                       Filter: (deleted_id = 0)
                                       ->  Bitmap Index Scan on ar_sub_cust  (cost=0.00..38.05 rows=1835 width=0)
                                             Index Cond: (cust_row_id = 4)
         ->  Hash  (cost=2.23..2.23 rows=10 width=9)
               ->  Hash Left Join  (cost=1.04..2.23 rows=10 width=9)
                     Hash Cond: (x.tran_row_id = w.row_id)
                     ->  Seq Scan on cb_tran_rec_det x  (cost=0.00..1.10 rows=10 width=8)
                     ->  Hash  (cost=1.02..1.02 rows=2 width=9)
                           ->  Seq Scan on cb_tran_rec w  (cost=0.00..1.02 rows=2 width=9)