Re: How to hint 2 coulms IS NOT DISTINCT FROM each other

Merlin Moncure <mmoncure@xxxxxxxxx> · Fri, 4 Nov 2016 08:41:07 -0500

On Fri, Nov 4, 2016 at 8:08 AM, Kim Rose Carlsen <krc@xxxxxxxx> wrote:
>>> It might raise another problem, that the nulls are generated through LEFT
>
>>> JOINS where no rows are defined. Then the 0 or -1 value need to be
>>> a computed value. Won't this throw off index lookups? (I might be
>>> more confused in this area).
>>
>>Not following this.
>
> The nulls are generated by something like this
>     SELECT c.circuit_id,
>                    cc.customer_id
>        FROM circuit AS c
> LEFT JOIN circuit_customer AS cc
>              ON c.circuit_id = cc.circuit_id
>
> To make a magic '0' customer we would be required to use
>   COALESCE(cc.customer_id, '0')
> I dont think the optimizer will do anything clever with the '0' we have
> computed from null.

It would if you explicitly indexed it as such;
CREATE INDEX ON circuit_customer((COALESCE(customer_id, '0'));

> I could ofc. by default assign all unassigned circuits to '0' in
> circuit_customer. I'm not a fan though.

hm, why not?  null generally means 'unknown' and that's why it fails
any equality test.

>>BTW, if you want a fast plan over the current
>>data without consideration of aesthetics, try this:
>>
>>CREATE VIEW view_circuit_with_status AS (
>>    SELECT r.*,
>>          s.circuit_status,
>>          s.customer_id AS s_customer_id,
>>          p.line_speed,
>>          p.customer_id AS p_customer_id
>>     FROM view_circuit r
>>     JOIN view_circuit_product_main s
>>      ON r.circuit_id = s.circuit_id
>>      AND r.customer_id, s.customer_id
>>     JOIN view_circuit_product p
>>       ON r.circuit_id = p.circuit_id
>>      AND r.customer_id, s.customer_id
>>  UNION ALL SELECT r.*,
>>          s.circuit_status,
>>          s.customer_id AS s_customer_id,
>>          p.line_speed,
>>          p.customer_id AS p_customer_id
>>     FROM view_circuit r
>>     JOIN view_circuit_product_main s
>>       ON r.circuit_id = s.circuit_id
>>      AND r.customer_id IS NULL
>>      AND  s.customer_id IS NULL
>>     JOIN view_circuit_product p
>>       ON r.circuit_id = p.circuit_id>
>
> I will have to figure something out, but this specific case is still
> problematic
> since we would like to filter this view using different criteria's, like
> circuit_no,
> products or customers.

the above is logically equivalent to IS NOT DISTINCT FROM; you should
be able to query it as you would have done the original view.

> But with all these detours, I assume that a change to IS NOT DISTINCT FROM,
> is difficult or not wanted?

Well, not exactly.  In your case you are trying to treat null as a
specific value and pass it through join operations.

TBH, this is a pretty dubious approach: null is not supposed to be
equal to anything and any join vs null should come up empty --
logically at least.  INDF works around this of course but it's not a
recommended approach (my usage is generally restricted to, "has this
value changed since yesterday? etc").

I'm not an expert backend structures for indexing and optimization but
I know enough to suspect that optimizing INDF might cause
implementation headaches in various places, as do other irregular
syntactical approaches in SQL.  I think minimally optimizing INDF
would require converting it to an operator on par with '=' which is a
pretty large infrastructure change for an edge optimization case.
The fact that there are solid optimization strategies already on the
table (UNION ALL, expr index COALESCE()) does not help.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general