Search Postgresql Archives

Re: When to store data that could be derived

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/24/19 1:42 AM, Frank wrote:
Hi all

As I understand it, a  general rule of thumb is that you should never create a physical column if the data could be derived from existing columns. A possible reason for breaking this rule is for performance reasons.

I have a situation where I am considering breaking the rule, but I am not experienced enough in SQL to know if my reason is valid. I would appreciate it if someone could glance at my 'before' and 'after' scenarios and see if, from a 'gut-feel' point of view, I should proceed.

I have a VIEW constructed as follows -

CREATE VIEW view_name AS
[select statement 1]
UNION ALL
[select statement 2]
etc.

This is one of the select statements. I will give the 'after' scenario first -

SELECT
    'arec' AS tran_type, a.row_id AS tran_row_id,
    a.tran_number AS tran_number, a.cust_row_id AS cust_row_id,
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.tran_date
        WHEN a.tran_type = 'cb_rec' THEN w.tran_date
    END AS tran_date,
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.text
        WHEN a.tran_type = 'cb_rec' THEN w.text
    END AS text,
    0 - a.arec_cust AS amount_cust,
    0 - a.arec_local AS amount_local
FROM ar_rec_subtran a
LEFT JOIN ar_tran_rec_det z ON z.row_id = a.tran_det_row_id
LEFT JOIN ar_tran_rec y ON y.row_id = z.tran_row_id
LEFT JOIN cb_tran_rec_det x ON x.row_id = a.tran_det_row_id
LEFT JOIN cb_tran_rec w ON w.row_id = x.tran_row_id
WHERE
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.posted
        WHEN a.tran_type = 'cb_rec' THEN w.posted
    END = '1'

The two columns a.arec_cust and a.arec_local *can* be derived from other columns, and in fact that is how it is working at the moment, so here is the 'before' scenario -

SELECT
    'arec' AS tran_type, a.row_id AS tran_row_id,
    a.tran_number AS tran_number, a.cust_row_id AS cust_row_id,
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.tran_date
        WHEN a.tran_type = 'cb_rec' THEN w.tran_date
    END AS tran_date,
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.text
        WHEN a.tran_type = 'cb_rec' THEN w.text
    END AS text,
    ROUND(0 - (ROUND(a.arec_amount / CASE
        WHEN a.tran_type = 'ar_rec' THEN y.tran_exch_rate
        WHEN a.tran_type = 'cb_rec' THEN w.tran_exch_rate
    END * a.cust_exch_rate, u.scale)), 2) AS amount_cust,
    ROUND(0 - (ROUND(a.arec_amount / CASE
        WHEN a.tran_type = 'ar_rec' THEN y.tran_exch_rate
        WHEN a.tran_type = 'cb_rec' THEN w.tran_exch_rate
    END, s.scale)), 2) AS amount_local
FROM ar_rec_subtran a
LEFT JOIN ar_tran_rec_det z ON z.row_id = a.tran_det_row_id
LEFT JOIN ar_tran_rec y ON y.row_id = z.tran_row_id
LEFT JOIN cb_tran_rec_det x ON x.row_id = a.tran_det_row_id
LEFT JOIN cb_tran_rec w ON w.row_id = x.tran_row_id
LEFT JOIN ar_customers v ON v.row_id = a.cust_row_id
LEFT JOIN adm_currencies u ON u.row_id = v.currency_id
LEFT JOIN adm_params t ON t.row_id = 1
LEFT JOIN adm_currencies s ON s.row_id = t.local_curr_id
WHERE
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.posted
        WHEN a.tran_type = 'cb_rec' THEN w.posted
    END = '1'

As you can see, complexity has increased and there are four additional JOINs.

I am expecting the VIEW to be used extensively for query purposes, and my gut-feel says that the second one is likely to lead to performance problems in a system with a lot of data and a lot of users.

Generate an artificial load and test it?

I am not looking for an answer - I know that I should create dummy data and run some timing tests. I was just wondering if someone more experienced would wince when they look at the second SELECT, or if they would shrug and think that it looks fine.

Sure the second query joins a lot of tables, but is pretty straightforward.

What REALLY worries me is whether or not the query optimiser would look at the WHERE CASE, run away screaming and then make it use sequential scans.  Thus, even query #1 would be slow.


Any input will be appreciated.


Is this a historical data set that's never updated, or current data that's constantly added to?


Frank Millman




--
Angular momentum makes the world go 'round.





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux