Re: How to hint 2 coulms IS NOT DISTINCT FROM each other

Kim Rose Carlsen <krc@xxxxxxxx> · Fri, 28 Oct 2016 19:54:16 +0000

> > On Fri,
 Oct 28, 2016 at 10:29 AM, Kim Rose Carlsen <krc@xxxxxxxx> wrote:

> > Hi

> >

> > I was wondering if there is a way to hint that two columns
 in two different

> > tables IS NOT DISTINCT FROM each other. So that the optimizer
 may assume if

> > table_a.key = 'test' THEN table_b.key = 'test' .

> >

> > The equals operator already does this but it does not handle
 NULLS very well

> > (or not at all). And it seems IS NOT DISTINCT FROM is not
 indexable and

> > doesn't establish the same inference rules as equals.

> 

> The whole idea behing Postgres' query planner is that you don't
 have

> to use any hints. Late model versions of postgres handle nulls
 fine,

> but nulls are never "equal" to anything else. I.e. where xxx
 is null

> works with indexes. Where x=y does not, since null <> null.

> 

> Suggestion for getting help, put a large-ish aka production
 sized

> amount of data into your db, run your queries with explain analyze
 and

> feed them to
https://explain.depesz.com/
 and post the links here along

> with the slow queries. A lot of times the fix is non-obvious
 if you're

> coming from another db with a different set of troubleshooting
 skills

> for slow queries.

The problem is how to reduce the problem into its core, without introducing 
all the unnecessary.

Maybe simplifying the problem, also makes it impossible to say where I go 

wrong. It might be that I try to push too much logic into the SQL layer 

and Im adding too many layers of abstraction to accomplish what I want. 

So let me try and elaborate a little more.

I have couple a tables describing resources (circuits) and allocation 

of resources to customers and products.

First layer is a view called view_circuit. This view (left) join any
 table 

the circuit table reference through a foreign key (it gives exactly the same

rows and columns as circuit table + some extra information like customer_id).

Second layer is 2 views
1) a view describing if the circuit is active or inactive, lets call it 
   view_circuit_product_main

2) a view describing line_speed about the circuit, lets call it 
   view_circuit_product

These views use aggregations (both GROUP BY and SELECT DISTINCT ON (...)) 
if this has any relevance.

Third layer
Next step is to add a view that tells both (joins the two views together
on circuit_id). lets call the new view
view_circuit_with_status

This view is defined as

CREATE VIEW
view_circuit_with_status AS (
   SELECT r.*,
          s.circuit_status,
          s.customer_id AS s_customer_id,
          p.line_speed,
          p.customer_id AS p_customer_id
     FROM view_circuit r
     JOIN view_circuit_product_main s
       ON r.circuit_id = s.circuit_id
      AND r.customer_id IS NOT DISTINCT FROM s.customer_id
     JOIN view_circuit_product p
       ON r.circuit_id = p.circuit_id
      AND
 r.customer_id IS NOT DISTINCT FROM s.customer_id
);

SELECT * FROM view_circuit_with_status WHERE customer_id =
 1;

Since customer_id is exposed through view_circuit the planner assumes view_circuit.customer_id = 1 and from there attempts to join 
view_circuit_product_main
 and view_circuit_product using circuit_id. 
This is not running optimal.

However if we change our query to allow the inference rule to take place, the query is executed very fast.

SELECT * FROM view_circuit_with_status
 WHERE customer_id = 1 AND s_customer_id = 1 AND p_customer_id = 1;

If a circuit is not assigned to any customers
 customer_id is set to NULL. This is the reason I can't use = operator. If I do use = then I can't find circuit which are unassigned, but the query do run effective.

I can see this still ends up being quite abstract, but the point is it would be quite beneficial if IS NOT DISTINCT used the same rules as = operator.

I have attached the 2 query plans

Bad plan: https://explain.depesz.com/s/SZN
Good plan: https://explain.depesz.com/s/61Ro

-

Kim Carlsen
Do you use potatoes for long posts here?