Re: Relational Algebra and Aggregate Functions

Sam Mason <sam@xxxxxxxxxxxxx> · Tue, 28 Jul 2009 17:11:29 +0100

On Tue, Jul 28, 2009 at 11:26:01AM -0400, Robert James wrote:
> On Tue, Jul 28, 2009 at 9:47 AM, Sam Mason <sam@xxxxxxxxxxxxx> wrote:
> > On Tue, Jul 28, 2009 at 09:14:38AM -0400, Robert James wrote:
> > > Many wrote that the functional programming 'fold' is a good model for
> > > relational aggregate functions.  I have a few difficulties with this:
> > > 1. fold doesn't offer any type of GROUP BY, which is an essential
> > component
> > > of aggregation.
> >
> > Not sure if I'd agree, a GROUP BY without any aggregate functions looks
> > pretty indistinguishable from just a DISTINCT on the same columns to me.
> 
> DISTINCT will collapse duplicates, which is not what we want when computing
> COUNT, SUM, or AVG - please see below.

That's why I said "without any aggregate functions".  E.g:

  SELECT a,b FROM foo GROUP BY a,b;

vs.

   SELECT DISTINCT a,b FROM foo;

I guess we're talking about different things.  Folds are an example of
something called a Catamorphism (as wikipedia seems to know these days).
If you're really interested in this then I liked the "bananas in space"
paper:

  http://citeseer.ist.psu.edu/293490.html

wikipedia seems to like:

  http://citeseer.ist.psu.edu/meijer91functional.html

I don't think these are the same, but citeseer seems to be down at the
moment and I can't check.

> > > 3. fold is defined on sequences, not sets.  This doesn't seem to be a
> > > problem until you think about cases where there a duplicates of the
> > > aggregated field.  (For instance, there are 10 bags each weighing 5 lbs,
> > and
> > > you want SUM(weight) - you need to project weight onto a collection which
> > > allows for 10 occurences, or define the aggregate function to work on the
> > > whole tuple somehow... I know a man named Krug worked out a formal theory
> > > for this...)
> >
> > I don't see why this is a problem at all; could you give a concrete
> > example?
> 
> Relation LUGGAGE = { (name:'ball', weight:3), (name:'bat', weight:3)}
> How do we formalize SELECT SUM(weight) FROM LUGGAGE? We could
> project_weight(LUGGAGE) and then apply SUM, except that would give us
> {(weight:3), (weight:3)}, which is not a set (it has duplicates).  We could
> define a new operation: project_to_list (allowing duplicates), or we could
> define SUM(weight) over the LUGGAGE relation as a whole - either way, we
> need to extend the theory a bit.

Which "theory" are we talking about here?  I assume you're talking
about relational algebra, at which point I believe that aggregates have
to be handled explicitly and this whole conversation becomes somewhat
meaningless.

I mentioned the analogy with folds because their semantics are somewhat
more accessible to software people, if you're really interested
in relational algebra then I'd stay away from folds and deal with
aggregates directly.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general