Re: Proper relational database?

<david@xxxxxxxx> · Sun, 24 Apr 2016 22:10:20 +1000

> owner@xxxxxxxxxxxxxx] On Behalf Of Thomas Munro

> So what incremental improvements could we steal from "properly relational"
> query languages?

Here is my list of deficiencies in some or all dialects of SQL.
-Relation and tuple as data types
-Globally exclude NULLs and duplicate columns
-Relation with no columns, null key
-Natural antijoin
-Tuple join operations
-Tuple tests for equality, superset, subset
-Tuple test for set membership of relation
-Relation tests for equality, superset, subset
-Relation set operations include symmetric difference
-User-defined functions of arbitrary complexity
-User-defined aggregation functions of arbitrary complexity
-User-defined ordered queries of arbitrary complexity
-Iterative/recursive queries (when)

> Here's one I've thought about, trivial as it may be.  I noticed that Tutorial
> D (and apparently Andl too) includes a [NOT] MATCHING operator (alternatively
> spelled SEMIJOIN and SEMIMINUS) corresponding to the ⋉ (semi-join) and ▷
> (anti-join) operators from relational algebra.  In SQL you write [NOT] EXISTS
> or [NOT] IN in the WHERE clause, rather than something explicit in a <joined
> table> clause, though experienced users often talk explicitly about semi- and
> anti-joins, both because of the theory and because the terms show up in query
> plans.

Yes, there are two joins (join and antijoin). Semijoin is just one of a number of projections following a join, but antijoin is a quite different algorithm.

Antijoin is quite hard to write in SQL in such a way that the query planner will do the right thing. There is a lot of variation between dialects.

> A recent blog post[1] argues that SQL should have a SQL92-style JOIN syntax
> for this and points at a couple of products that have added one[2][3].  I
> guess it might be hard to convince the Postgres community to add support for
> a non-standard syntax that doesn't give you anything you can't already do,
> but as an idea I find it interesting and it seems to be in the spirit of the
> part of the Third Manifesto that says: "support[ing] the usual operators of
> the relational algebra [..].  All such operators shall be expressible without
> excessive circumlocution."

The purpose is simply that explicit syntax allows for explicit query optimisation.
> 
> For example, say we want all students who have one or more exam today:
> 
>   SELECT s.student_id, s.name
>     FROM student s
>    WHERE EXISTS (SELECT 1
>                    FROM exam e
>                   WHERE e.student_id = s.student_id
>                     AND e.exam_date = CURRENT_DATE)
> 
> I don't know Tutorial D, but I think it might express that with something
> like:
> 
>   ( student MATCHING exam
>     WHERE exam_date = CURRENT_DATE )
>   { student_id, name }
> 
> With 10 minutes of prototype hacking I convinced Postgres to accept SEMI and
> ANTI like this:
> 
>   SELECT s.student_id, s.name
>     FROM student s SEMI JOIN exam e USING (student_id)
>    WHERE e.exam_date = CURRENT_DATE
> 
> I guess a real version should accept (or require?) LEFT or RIGHT before
> SEMIANTI.  

Andl has semijoin, rsemijoin, ajoin and rajoin. They are trivial to add once you have the basic algorithm, but Tutorial D does not.

When using this hypothetical syntax I think you should be allowed
> to refer to e.exam_date in the WHERE clause but not in the SELECT list
> (Impala apparently does allow you to see data from exam, and returns values
> from an arbitrary matching row, but that doesn't seem right to me).  But the
> EXISTS syntax is correspondingly strange in that it requires you to provide a
> SELECT list which is entirely discarded, so people often write "*" or "1"
> (erm, OK, I guess you can use an empty select list in recent Postgres).

SQL has an implicit ordering of query evaluation -- you will often need to write a nested subquery or correlated query for what should be very straightforward situations. That's another thing that's easy to fix, if allowed.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general