Re: SQL queries as sets: was The tragedy of SQL

Raymond Brinzer <ray.brinzer@xxxxxxxxx> · Wed, 15 Sep 2021 11:29:07 -0400

On Wed, Sep 15, 2021 at 12:55 AM Steve Litt <slitt@xxxxxxxxxxxxxxxxxxx> wrote:
> Rich, could you please elaborate on SQL queries being based on sets? I
> never thought of it that way, and would like to hear your related
> thoughts.

I'll take a crack at this.  Going through the setup will require a
little patience, but I think the underlying idea is neat enough to be
worth it.

A set is an unordered collection of unique elements.  This means:

{ 1, 2, 3 } = { 3, 2, 1 }
{ 2, 3, 3 } is not a set.

(The elements don't have to be numbers, but I'm using them for convenience.)

Take two sets, A and B:

A = { 1, 2, 3 }
B = { 4, 5, 6 }

The Cartesian product A x B is the complete set of ordered pairs (a,
b) you can make from them:

(1, 4)
(1, 5)
(1, 6)
(2, 4)
(2, 5)
(2, 6)
(3, 4)
(3, 5)
(3, 6)

These pairs are called tuples.  Tuples don't have to be pairs; if you
were using A x B x C, for instance, each one would have three
elements.

A relation is a subset of the Cartesian product of the sets.  For instance:

(1, 4)
(2, 5)
(3, 6)

(In math, some relations are functions; that one happens to be a
function which follows the rule f(a) = a + 3.  But that's not really a
database matter.)

Anyway, now: for "relation", say "table", and for "tuple", say "row".
This is (in theory) is what a relational database is about: storing
relations.

In math, tuples are ordered, but with databases we give the elements
names, instead:

(a: 1, b: 4)

It doesn't really matter, because the names uniquely identify the
elements, just as ordering does.  You can go back and forth between
orders and names, so the different representations have the same
structure.

So, let's say you do this:

CREATE TABLE whatever (a INTEGER, b STRING, UNIQUE(a,b));

What you are saying, in a sense, is:  "Consider the set of tuples
which pair every possible integer with every possible string.  I'm
going to be storing a subset of that."

What's interesting about this is:  because you're working with sets,
all the operations you can perform on sets work here.  Tables X and Y
are both sets of tuples, so you can find their cartesian product,
intersection, union, and so forth.  Proofs about sets apply, and so
on.

That is, if the tables are actually relations... which means they have
to be sets.  This is where things get controversial.

SQL allows tables to have duplicate records.  It also has NULL, which
allows duplicate records even in a table like the one above.  Although
we declared UNIQUE(a,b), we can still say:

INSERT INTO whatever VALUES (NULL,NULL), (NULL,NULL);

So, arguably SQL isn't actually relational.  To some, that's a matter
of it being practical, so that people can actually get work done.  To
others, it's a travesty which robs databases of power and clarity.

Anyway, that's my summary.

-- 
Ray Brinzer