Re: Equivalence Classes when using IN

Kim Rose Carlsen <krc@xxxxxxxx> · Tue, 10 Oct 2017 06:48:09 +0000

> You would benefit from adding the age column to view_customer, or at

> least consider having some view which contains all the columns you'll

> ever need from those tables and if you need special views with only a

> subset of columns due to some software doing "select * from

> viewname;", then you could just create some. Joining to the same table

> again seems like a bit of a waste of effort for the planner and

> executor.  

I would argue that the anti pattern would be the software that

insist on using "select * from viewname;" from a view that has

calculated columns that you do not care for. I recommend

introducing both lightweight views and heavyweight views, so you

can join up probably for what you need.

My example is fabricated trying to simplify things, but I seem to

create more confusion than clarity in my example. My point was

only to see if anything could be added to the fabricated

execution path. I agree that the listed example does not make

sense. So I will try and give some more context to real use

cases.

Imagine an invoice entity where you have one relation for invoice

base data and a relation for invoice_line. The invoice has some

invoice_id, customer_id, due_date, paid_date and invoice_line

contains each line with a invoice_id, display_name, amount. A

view (view_invoice_with_amount) where you calculate the total.

so a query could be 

SELECT c.customer_id, 

       i.invoice_amount_total

  FROM view_customer c

  JOIN view_invoice_with_amount i

    ON c.customer_id = i.customer_id

 WHERE c.first_name = 'John';

If you ever need to filter by invoice_amount_total, it might be

necesary denormalize the relations and cache the amount in the

invoice table.

> I'd assume customer_id is the PRIMARY KEY of customer and

> is unique.

This is a continuation of the previous example, maybe I should have 

included it all to make it more clear. But customer_id is declared

as a primary key.

> It's not all that clear what your view is doing here. Confusingly

> there's a Sort in the plan, yet nothing in the query asked for that,

> so I guess that the view must have an ORDER BY. If you get rid of that

> the planner would likely use an index on product (customer_id) to

> parameterise the nested loop, at least, it likely would, if you have

> one.

The view is defined in the original post. What I was trying to illustrate

was a DISTINCT ON clause to prioritize multiple products pr customer

to a somewhat "main" product for the customer. The ORDER BY on product_id

would in this case then map the first product a customer gets to its 

"main" product. It could also be the most valuable product or newest ordered

active product etc. It is just some way of mapping one to many relation to a

one to one. Again the example is simplified and fabricated and maybe looses

its power to explain its intents.

> It's pretty bad practice to have ORDER BY in views. I kinda wish we

> didn't even allow it, but that ship sailed many years ago...

It is required by DISTINCT ON and as soon as you go into

reporting, datawarehouse then it gets difficult to avoid these

along with group by. Instead of writing each query from the

ground up you get a huge benefit by factorizing each query into

meaningful entities that can stand alone and make sense by

themself, and from these build up the query to answer your

questions. That way you gain lots of re-use of code and

definition doesn't change between queries. The down side is it

leaves alot of work to the planner. It's a trade off between

optimization, readability and simplicity.

I hope I make more sense now.

- Kim