> -----Original Message----- > From: Alban Hertroys [mailto:haramrae@xxxxxxxxx] > Sent: Friday, July 20, 2012 5:03 PM > To: David Johnston > Cc: pgsql-general@xxxxxxxxxxxxxx > Subject: Re: A Better Way? (Multi-Left Join Lookup) > > On 20 Jul 2012, at 22:30, David Johnston wrote: > > > Hi! > > > > Can someone please point me to a resource (or suggest a solution) that will > improve the performance of this query? I have some thoughts but figure I > should avoid reinventing the wheel since this seems like something that has > to have been solved already. > > > > I am working on a query where I have a list of identifiers (sample set has > about 8,500 records) and I have three other queries that return a subset of > these 8,500 identifiers > > > > Basic query is designed as such: > > > > WITH > > full_set AS ( ) -- 8,500 records > > , sub_1 AS () -- also about 8,500 > > , sub_2 AS () -- maybe 5,000 > > , sub_3 AS () - - maybe 3,000 > > SELECT full_set.* > > , COALESCE(sub_1.field, FALSE) > > , COALESCE(sub_2.field, FALSE) > > , COALESCE(sub_2.field, FALSE) > > > > FROM full_set > > LEFT JOIN sub_1 > > LEFT JOIN sub_2 > > LEFT JOIN sub_3 > > > > The goal is to output a boolean for each record in "full_set" specifying > whether a corresponding records exists in the sub-set. If the record exists > "sub_x.field" is defined to be TRUE and thus is output otherwise sub_x.field > is NULL and coalesce returns FALSE. > > You are creating a product of the result sets for sub_1 to _3 there, while you > only seem to need the union of the three. > > Perhaps something like this is what you're after? > > WITH > full_set AS ( ) > , subs AS ( > SELECT 1 AS sub, TRUE AS field, ... FROM sub_1 > UNION ALL > SELECT 2 AS sub, TRUE AS field, ... FROM sub_2 > UNION ALL > SELECT 3 AS sub, TRUE AS field, ... FROM sub_3 > ) > SELECT ... > FROM full_set > LEFT JOIN subs > > If you need those rows to be distinct, use UNION instead of UNION ALL, but > the database needs to do more work for that. > > > Alban Hertroys > Using "UNION" I increase the number of output rows such that an identifier that has a matching record in all three subsets will appear 3-times in the result. Now, I can run this through a GROUP BY and use CASE statements to get it back into the multi-column format required but that seems messy. Also, there should not be a "product" between the sub-queries but only between an individual sub-query and the main query. The fact there are 3 sub-queries should result in additive resource consumption (al. la. UNION): [ M x (A + B + C) == MA + MB + MC ]. The left side is the UNION suggestion while the right-side is the current multi-left-join suggestion. Data wise they are equivalent but the left-side uses additional rows while the right-side uses additional columns. That said I will play with it just to see if the pre-UNION and a post-GROUP performs better than the multi-left-join that seems to be the most direct solution. Thank You! David J. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general