Re: Google Summer of Code: Full Disjunctions

"Tzahi Fadida" <tzahi.ml@xxxxxxxxx> · Sat, 13 May 2006 00:00:17 +0200

> -----Original Message-----
> From: pgsql-general-owner@xxxxxxxxxxxxxx 
> [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of 
> ana_cata_hylo@xxxxxxxxx
> Sent: Tuesday, May 09, 2006 1:51 AM
> To: pgsql-general@xxxxxxxxxxxxxx
> Subject: Re: [GENERAL] Google Summer of Code: Full Disjunctions
> 
> > First, i have no knowledge of anyone that have implemented 
> full disjunctions(ever) aside
> > from the theoretical works of my colleagues.
> > With the exception of a corner case of it, that I believe 
> was a simulation in 96.
> > (A. Rajaman and J.D. Ullman Integrating information by 
> outerjoins and full-disjunctions).
> > I'd love to hear about any implementation out there (aside 
> from my colleagues work, which
> > is mine also: cohen,sagiv, kimelfeld,kanza)
> 
> I didn't mean to imply there was. It was the Rajaraman & Ullman paper
> that got me interested in FD's and then I've looked at the "Computing
> Full Disjunctions" paper by Kanza & Sagiv which gives a general
> solution.
> Obviously from the second paper it's clear that implementing full
> disjunction (efficiently) is a non-trivial exercise.

If only all the FD problems would have the same difficulty, Rajaraman & Ullman paper
tries to solve. Their paper is actually trying to pinpoint where you can no longer use
your standard SQL tools to compute the FD. Commonly, as a rule of thumb, if you
have a cycle in your scheme graph, you can no longer use Rajaraman & Ullman
method (with some delicate exceptions) 
of using a series of natural outer joins to compute the FD which is very fast and
simple. However, in the general case where you have cycles you can forget about it.
The problem becomes very hard to compute and I wouldn't suggest it for huge relations.
My work is to see how we can alleviate the common troubles but the problem is still hard.

> 
> > It can never be a binary operation since at the heart of 
> the matter is that you need to take
> > each subset of the relations and join them. i.e.:
> ...
> > Usually binary operations allow for a bottom up computation 
> approach, but FD is a TOP down approach
> > (Galindo-Legaria, C. outerjoins as disjunctions).
> 
> Right, thanks for clarifying.
> 
> >From a data analysis perspective I would like to be able to look at
> various subsets, eg. FD(A,B,C), FD(B,C,D), FD(A,B,C,D) etc and so this
> just means that each subset has too be computed independantly. I can
> live with that but wasn't sure if I had missed something.

Basically you only have to look at FD(A,B,C,D).
It includes all the information of FD(A,B,C) and FD(B,C,D).

> 
> In any case, the difficulty of implementing FD precludes me from
> experimenting with it just yet.

Well, as I said, I already implemented it. What's left is to clean up the code from the statistics
of my experiments and improve the coding style and add some improvements I've been meaning to add.
The code is pretty big considering. It's about 200kb C code for the one algorithm I want to work on
for the gsoc.
And bigger if I will have time to add some of the others.

> 
> Regards
> Lee
> 
> 
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
> 
>