Re: Recursive Arrays 101

David Blomstrom <david.blomstrom@xxxxxxxxx> · Sun, 25 Oct 2015 18:10:33 -0700

@ Adrian Klaver: Oh, so you're suggesting I make separate tables for kingdoms, classes and on down to species. I'll research foreign keys and see what I can come up with. I hope I can make separate tables for mammal species, bird species, fish species, etc. There are just so many species - especially fish - the spreadsheets I use to organize them are just about maxed out as it is.
I've been using the Catalogue of Life as a guide, but I'm limited because I can never get their downloads to work. So all I can do is go to their website and copy a bunch of genera and species at a time.

However, I did open up some of the tables I downloaded and was amazed at how apparently amateurish they are. Yet their site works just fine and is fast enough.

@ Alban Hertroys: What does EOL mean? It reminds me of Encyclopedia of Life, which is doing what I was attempting to do years ago.

On Sun, Oct 25, 2015 at 2:07 PM, Alban Hertroys <haramrae@xxxxxxxxx> wrote:

> On 25 Oct 2015, at 19:38, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:

>

> On 10/25/2015 11:12 AM, David Blomstrom wrote:

>> I'm sorry, I don't know exactly what you mean by "definitions." The

>> fields Taxon and Parent are both varchar, with a 50-character limit.

>> ParentID is int(1).

>

> By definition I meant the schema, so from the below:

>

> CREATE TABLE t (

> N INT(6) default None auto_increment,

> Taxon varchar(50) default NULL,

> Parent varchar(25) default NULL,

> NameCommon varchar(50) default NULL,

> Rank smallint(2) default 0

> PRIMARY KEY (N)

> ) ENGINE=MyISAM

That can indeed be solved using a hierarchical query (provided you have a suitable table in PG); something akin to:

WITH RECURSIVE taxons AS (

--  Hierarchical root nodes

        SELECT N AS id, Taxon, Rank, 1 AS level, '' || N AS Path -- A useful addition explained further down

        FROM t

        WHERE ParentID IS NULL

-- Child nodes

        UNION ALL

        SELECT  N AS id, Taxon, Rank, taxons.level +1 AS level, taxons.Path || ':' || N AS Path

        FROM taxons

        JOIN t ON taxons.id = t.ParentID

)

SELECT id, Taxon, Rank, level

FROM taxons

ORDER BY Path

;

The Path-bit looks complicated, but basically that just appends ID's within the same hierarchy such that, when sorted on that field, you get the hierarchy in their hierarchical order. What the hierarchy would look like if it were shown as a file hierarchy with sub-directories expanded, for example. That's pretty much the only viable alternative (alternatives vary on the column used to create the hierarchy), which is why I added it to the example.

The fun thing with hierarchical queries is that you can add all kinds of extra information and make it trickle down to the child nodes, such as the items that make up the root of the hierarchy (pretty useful for grouping), for example or a field that calculates a string to prepend for indentation, etc. Or a computation that depends on values in parent items (I used this successfully in a bill of materials to calculate absolute quantities by volume, quantities by weight and cost of components in the end product where they were given relative to 1 kg of their parent, for example).

It's highly flexible and powerful (and standard SQL), but it takes a bit of time to get in the right mindset.

PS. I usually write my hierarchical queries in Oracle, which isn't quite as good at them as Postgres is, but it's what we have @work. Hence, I'm not sure I got the syntax 100% correct. We're working on getting PG in for a project upgrade (replacing RDB on OpenVMS, which will go EOL in <10 years!) - fingers crossed.

Cheers!

Alban Hertroys

--

If you can't see the forest for the trees,

cut the trees and you'll find there is no forest.

-- 
David Blomstrom
Writer & Web Designer (Mac, M$ & Linux)
www.geobop.org