Re: A Better Way? (Multi-Left Join Lookup)

"David Johnston" <polobo@xxxxxxxxx> · Fri, 20 Jul 2012 17:49:16 -0400

> -----Original Message-----
> From: Alban Hertroys [mailto:haramrae@xxxxxxxxx]
> Sent: Friday, July 20, 2012 5:03 PM
> To: David Johnston
> Cc: pgsql-general@xxxxxxxxxxxxxx
> Subject: Re:  A Better Way? (Multi-Left Join Lookup)
> 
> On 20 Jul 2012, at 22:30, David Johnston wrote:
> 
> > Hi!
> >
> > Can someone please point me to a resource (or suggest a solution) that
will
> improve the performance of this query?  I have some thoughts but figure I
> should avoid reinventing the wheel since this seems like something that
has
> to have been solved already.
> >
> > I am working on a query where I have a list of identifiers (sample set
has
> about 8,500 records) and I have three other queries that return a subset
of
> these 8,500 identifiers
> >
> > Basic query is designed as such:
> >
> > WITH
> >   full_set AS ( ) -- 8,500 records
> > , sub_1 AS () -- also about 8,500
> > , sub_2 AS () -- maybe 5,000
> > , sub_3 AS () - - maybe 3,000
> > SELECT full_set.*
> > , COALESCE(sub_1.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> >
> > FROM full_set
> > LEFT JOIN sub_1
> > LEFT JOIN sub_2
> > LEFT JOIN sub_3
> >
> > The goal is to output a boolean for each record in "full_set" specifying
> whether a corresponding records exists in the sub-set.  If the record
exists
> "sub_x.field" is defined to be TRUE and thus is output otherwise
sub_x.field
> is NULL and coalesce returns FALSE.
> 
> You are creating a product of the result sets for sub_1 to _3 there, while
you
> only seem to need the union of the three.
> 
> Perhaps something like this is what you're after?
> 
> WITH
>   full_set AS ( )
> , subs AS (
>   SELECT 1 AS sub, TRUE AS field, ... FROM sub_1
>   UNION ALL
>   SELECT 2 AS sub, TRUE AS field, ... FROM sub_2
>   UNION ALL
>   SELECT 3 AS sub, TRUE AS field, ... FROM sub_3
> )
> SELECT ...
> FROM full_set
> LEFT JOIN subs
> 
> If you need those rows to be distinct, use UNION instead of UNION ALL, but
> the database needs to do more work for that.
> 
> 
> Alban Hertroys
>

Using "UNION" I increase the number of output rows such that an identifier
that has a matching record in all three subsets will appear 3-times in the
result.  Now, I can run this through a GROUP BY and use CASE statements to
get it back into the multi-column format required but that seems messy.
Also, there should not be a "product" between the sub-queries but only
between an individual sub-query and the main query.  The fact there are 3
sub-queries should result in additive resource consumption (al. la. UNION):
[ M x (A + B + C) == MA + MB + MC ].  The left side is the UNION suggestion
while the right-side is the current multi-left-join suggestion.  Data wise
they are equivalent but the left-side uses additional rows while the
right-side uses additional columns.

That said I will play with it just to see if the pre-UNION and a post-GROUP
performs better than the multi-left-join that seems to be the most direct
solution.

Thank You!

David J.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general