Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

"Ian Sillitoe" <ian.sillitoe@xxxxxxxxxxxxxx> · Wed, 2 Apr 2008 19:43:27 +0100

I completely take your points - so maybe I should be asking for advice on database design instead.

We are annotating nodes on a hierarchical structure where NULL implied an absence of a value in this depth of the hierarchy. As a method of enforcing this view, we use constraints to make sure we only get
one combination of rows for a given node of the hierarchy (whether the
columns are integers or NULLs). I'm not suggesting this was 'correct', I'm just saying this was what we were thinking.

We considered using a flag of some description instead, however this also seemed a bit messy (although in hindsight probably far less messy than the job I now face trying to recode this table).

psql> select * from tree;
 id | depth1 | depth2 | depth3 |           name

----+--------+--------+--------+--------------------------
  1 |      1 |        |        | Name for node_id "1"

  1 |      1 |      1 |        | Name for node_id "1.1"
  2 |      1 |      2 |        | Name for node_id "1.2"

  3 |      1 |      2 |      3 | Name for node_id "1.2.3"
(3 rows)

psql> select * from leaf_node;
   id   | depth1 | depth2 | depth3 | depth4 | depth5

--------+--------+--------+--------+--------+--------
 leaf_1 |      1 |      2 |      3 |      1 |      1

 leaf_2 |      1 |      2 |      3 |      1 |      2
 leaf_3 |      1 |      2 |      3 |      1 |      3

 leaf_4 |      1 |      2 |      3 |      2 |      1
 leaf_5 |      1 |      2 |      3 |      2 |      2

(5 rows)

So, is there a generally accept method of modelling nodes in hierarchical structures (i.e. the 'tree' table in the example above) that doesn't use NULLs? We started off using ids such as '1.1.2' as the PRIMARY KEY but concatenating strings to go between columns and ids ended up being too painful. 

Would it be sensible to use an ARRAY of depths as the primary key? I guess this would mean it would be possible to join 

psql> select * from tree;

  id  | depth1 | depth2 | depth3 |           name
------+--------+--------+--------+--------------------------

[1]   |      1 |        |        | Name for node_id "1"
[1,2] |      1 |      1 |        | Name for node_id "1.1"

Advice most welcome.

Cheers,

Ian

On Wed, Apr 2, 2008 at 6:48 PM, Ted Byers <r.ted.byers@xxxxxxxxxx> wrote:

--- Martijn van Oosterhout <kleptog@xxxxxxxxx> wrote:

> 'transform_null_equals' won't help you at all here

> since it only help

> in the very specific case of comparing with a

> constant. The easiest is

> to think of NULL as meaning 'unknown'. Clearly you

> can't compare that

> usefully with anything.

>

Not even a null in another record ... (hence my

question below).  If the value is unknown, then it

could be anything, and (thinking as a mathematician

considering real numbers) the probability of two

records having null having their true, but unknown

values be the same is indistinguishable from 0. (with

integers or decimal numbers or floating point numbers,

that would be qualified with the clause, for practical

purposes :)

> Perhaps you can use a marker like -1 to achieve the

> effect you want?

>

Is that really valid, though, especially in a join? I

mean, if the column in question has multiple nulls, in

each of the tables, then how do you, of the DB, figure

out which of the rows containing nulls in the one

table match up with rows in the other table containing

nulls?  Or is the resultset supposed to be the product

of the two sets (match each row with a null in the one

table with each row with a null in the other)?  That,

for me, creates a nightmare situation where some of my

tables have tens of millions of rows, and if even 1%

of the rows contains null in the relevant column, I

don't even want to think about processing the

resultset that would be produced from such an idea

using these tables.

My joins always only involve primary keys, or indeces

on columns that prohibit nulls, so this problem

doesn't crop up in my code, but would I be not too far

from right in expecting that the rational thing to do

when creating a join on columns that allow nulls is to

exclude ALL rows, in either table, where the columns

involved are null?

Cheers,

Ted