Re: Nodes and trees...

Merlin Moncure <mmoncure@xxxxxxxxx> · Tue, 3 Aug 2010 11:10:49 -0400

On Tue, Aug 3, 2010 at 8:01 AM, Jason Schauberger
<crossroads0000@xxxxxxxxxxxxxx> wrote:
> Dear fellow Postgres users, :-)
>
> please consider this table:
>
> CREATE TABLE nodes (
>
> id      int     PRIMARY KEY,
>
> parent     int     REFERENCES nodes(id)
>
> );
>
> In this table, each node *can* have a parent node. You can picture the
> whole set of rows of this table as one or more trees with nodes and
> the root of the tree is the node which has no parent node (that is,
> parent is NULL).
>
> Now here's my objective: I want to *quickly* find all nodes that have
> the same root node.
>
> I could iterate through the complete tree in question starting from
> the root node and going all the way through to the terminal/leaf nodes
> (those without a child), but that could take quite long depending on
> how large the tree is.
>
> So, what I could do is this:
>
> CREATE TABLE nodes (
>
> id      int     PRIMARY KEY,
>
> parent     int     REFERENCES nodes(id),
>
> root    int     NOT NULL        REFERENCES nodes(id)
>
> );
>
> and fill out the root column every time I insert a node. But then
> there is the problem of anomalies, where the root column could have an
> incorrect value (that is, the id of a node which is actually NOT the
> root of the same tree). I guess I could check for correctness using
> triggers.

yes, you should absolutely do that -- the role of the database is to
(as much as possible) make anomalies in your data impossible.  If you
go this route, set the root_id node in the trigger.  Postgres triggers
aren't strictly portable, but should be able to be migrated with
minimal effort. Looking it up in the application is possible, but
undesirable IMO.

If you are only specifically interested in the root node, you can
materialize it in the application and index it.  This isn't a general
solution for investigating familial relationships of nodes, but it
doesn't sound like you need something general.

The general solution that is going to be most portable is going
probably going to be a materialized path approach.  You encode the
parents of a node in a string (in postgres you can use an array) in
such away that you can pull out specific parents_ids.  Materialized
approaches have at least one big downside: updates to the tree are
extremely expensive.

Here is a pretty neat way of doing nested data with farey fractions
with SQL examples designed for Oracle -- they should be able to be
ported to postgres without too much effort:
http://arxiv.org/html/cs/0401014.  I've been looking for an excuse to
play with it!

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general