Search Postgresql Archives

Re: Equivalence Classes when using IN

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> You would benefit from adding the age column to view_customer, or at
> least consider having some view which contains all the columns you'll
> ever need from those tables and if you need special views with only a
> subset of columns due to some software doing "select * from
> viewname;", then you could just create some. Joining to the same table
> again seems like a bit of a waste of effort for the planner and
> executor.  

I would argue that the anti pattern would be the software that
insist on using "select * from viewname;" from a view that has
calculated columns that you do not care for. I recommend
introducing both lightweight views and heavyweight views, so you
can join up probably for what you need.

My example is fabricated trying to simplify things, but I seem to
create more confusion than clarity in my example. My point was
only to see if anything could be added to the fabricated
execution path. I agree that the listed example does not make
sense. So I will try and give some more context to real use
cases.

Imagine an invoice entity where you have one relation for invoice
base data and a relation for invoice_line. The invoice has some
invoice_id, customer_id, due_date, paid_date and invoice_line
contains each line with a invoice_id, display_name, amount. A
view (view_invoice_with_amount) where you calculate the total.

so a query could be
SELECT c.customer_id,
       i.invoice_amount_total
  FROM view_customer c
  JOIN view_invoice_with_amount i
    ON c.customer_id = i.customer_id
 WHERE c.first_name = 'John';

If you ever need to filter by invoice_amount_total, it might be
necesary denormalize the relations and cache the amount in the
invoice table.

> I'd assume customer_id is the PRIMARY KEY of customer and
> is unique.

This is a continuation of the previous example, maybe I should have
included it all to make it more clear. But customer_id is declared
as a primary key.

> It's not all that clear what your view is doing here. Confusingly
> there's a Sort in the plan, yet nothing in the query asked for that,
> so I guess that the view must have an ORDER BY. If you get rid of that
> the planner would likely use an index on product (customer_id) to
> parameterise the nested loop, at least, it likely would, if you have
> one.

The view is defined in the original post. What I was trying to illustrate
was a DISTINCT ON clause to prioritize multiple products pr customer
to a somewhat "main" product for the customer. The ORDER BY on product_id
would in this case then map the first product a customer gets to its
"main" product. It could also be the most valuable product or newest ordered
active product etc. It is just some way of mapping one to many relation to a
one to one. Again the example is simplified and fabricated and maybe looses
its power to explain its intents.

> It's pretty bad practice to have ORDER BY in views. I kinda wish we
> didn't even allow it, but that ship sailed many years ago...

It is required by DISTINCT ON and as soon as you go into
reporting, datawarehouse then it gets difficult to avoid these
along with group by. Instead of writing each query from the
ground up you get a huge benefit by factorizing each query into
meaningful entities that can stand alone and make sense by
themself, and from these build up the query to answer your
questions. That way you gain lots of re-use of code and
definition doesn't change between queries. The down side is it
leaves alot of work to the planner. It's a trade off between
optimization, readability and simplicity.

I hope I make more sense now.

- Kim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux