On 2019-03-24 10:05:02 +0200, Frank wrote: > > > On 2019-03-24 9:25 AM, Ron wrote: > > On 3/24/19 1:42 AM, Frank wrote: > > > As I understand it, a general rule of thumb is that you should > > > never create a physical column if the data could be derived from > > > existing columns. The main reason for this rule (which leads to the 3rd normal form) is to avoid inconsistencies when data is changed. > > Is this a historical data set that's never updated, or current data > > that's constantly added to? > > > > It is the latter - current data constantly added to. So the important part here is not whether data is added, but whether data is changed. Sure, new transactions are added all the time. But is it expected that the data used to derive amount_cust and amount_local (e.g. the exchange rate) is changed retroactively, and if it is should the computed amount change? (I'm a bit worried about the join with the customers table here - what happens when a customer moves their headquarters to a country with a different currency?) > > Sure the second query joins a lot of tables, but is pretty straightforward. > > > > What REALLY worries me is whether or not the query optimiser would look > > at the WHERE CASE, run away screaming and then make it use sequential > > scans. Thus, even query #1 would be slow. > > > > I had not realised that. I hope someone else chimes in on this. Your condition is: CASE WHEN a.tran_type = 'ar_rec' THEN y.posted WHEN a.tran_type = 'cb_rec' THEN w.posted END = '1' Think about how you would check that. The straightforward way is to compute the value of the case/end clause and compare that to 1. But to compute that value you first need the value of a.tran_type. There are two possible values here, so maybe an index scan on a.tran_type might be possible, but I'm not sure whether the optimizer is even smart enought to figure that out and if it is, whether those to values are selective enough (Maybe all or most records are either ar_rec or cb_rec). After that you can retrieve the posted value from the correct table. As a human I see that the condition can only ever be true for records from y and w with posted = 1. So it might be better to use index scans on those columns. But this is not that easy to see, and I don't know whether the optimizer can do it. Rewriting the condition as (a.tran_type = 'ar_rec' and y.posted = 1) or (a.tran_type = 'cb_rec' and w.posted = 1) might make it easier for the optimizer to find a good plan. (Please note that I haven't tested any of this. Use EXPLAIN to check what the optimizer really does. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@xxxxxx | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment:
signature.asc
Description: PGP signature