Re: Help with strange join

Mike Rylander <mrylander@xxxxxxxxx> · Mon, 7 Feb 2005 12:30:49 +0000

On Mon, 7 Feb 2005 12:34:39 +0100, Victor SpÃng Arthursson
<victor@xxxxxxxx> wrote:
> 
> 2005-02-04 kl. 20.36 skrev Mike Rylander:
> 
> > How about:
> >
> > SELECT r.* FROM opskrifter r JOIN opskrifter_content c ON (r.nummer =
> > c.opskrift) JOIN opskrifter_ingredienser i ON (c.ingrediens = i.id)
> > WHERE EXISTS ( SELECT l.relid FROM languages l WHERE  l.relid =
> > i.betegnelse GROUP BY l.relid HAVING COUNT(DIStINCT l.lang) = 1);
> 
> Hi and thanks for your answer!

No problem.  Did it do what you expect?

> 
> Could you please try to break this SQL down and try to explain what it
> does, and also show how to change languageâ
> 

Sure.  I guess that the EXISTS version was faster?  In any case:

SELECT r.*    -- you can add more columns here
                     -- from any table but languages

          -- I assume the joins are correct and transparent.  I just
          -- attempted to follow your example.  I wasn't sure about
          -- the join from ingredients to languages since the example
          -- didn't have any exact matches, but the format of the relid
          -- column looked right.
          FROM opskrifter r
                JOIN opskrifter_content c ON (r.nummer =c.opskrift)
                JOIN opskrifter_ingredienser i ON (c.ingrediens = i.id)

         WHERE
                EXISTS (  -- EXISTS runs a subselect and if ANY rows
are returned
                                -- it becomes true
http://www.postgresql.org/docs/8.0/interactive/functions-subquery.html#AEN12496

                                 -- This query, if run by itself, will
give you all the "relid"s from
                                 -- "languages" where ALL the rows in
the group have the same
                                 -- "lang" attribute.  The HAVING
clause is the key.  It looks at
                                 -- each group of rows with the same
relid as a virtual subtable
                                 -- and counts the number of distinct
"lang" columns.  If I
                                 -- understood your question
correctly, any partially translated
                                 -- ingredient lists will have more
than one distinct "lang".  In
                                 -- order to select a specific
language you would need to
                                 -- wrap this in another because of
GROUP BY restrictions, and
                                 -- use an aggregate to return the
"lang" attribute.

           SELECT lang FROM (
                                 SELECT l.relid, MAX(lang) as lang
FROM languages l
                                         WHERE  l.relid = i.betegnelse
                                         GROUP BY l.relid
                                         HAVING COUNT(DIStINCT l.lang) = 1
                               ) s2
                                WHERE l.lang = 'DE'
                 );

To generalize this, just replace
                                WHERE l.lang = 'DE'
with
                                WHERE l.lang = ?
and pass the language in as a parameter to the query.  The JOIN
version would be much easier to limit.  BTW, there was a typo in the
first post in the JOIN version.  It's fixed below.  Just change it
from

SELECT r.* FROM opskrifter r JOIN opskrifter_content c ON (r.nummer =
c.opskrift) JOIN opskrifter_ingredienser i ON (c.ingrediens = i.id)
JOIN ( SELECT l.relid FROM languages l GROUP BY l.relid HAVING
COUNT(DISINCT l.lang) = 1) s ON (l.relid = i.betegnelse);

to

SELECT r.* FROM opskrifter r JOIN opskrifter_content c ON (r.nummer =
c.opskrift) JOIN opskrifter_ingredienser i ON (c.ingrediens = i.id)
JOIN ( SELECT l.relid FROM languages l GROUP BY l.relid HAVING
COUNT(DISINCT l.lang) = 1) s ON (l.relid = i.betegnelse AND l.lang = 'DE');

I am curious as to which is faster for you.  If "languages" is small I
would expect the EXISTS version to be faster, but for a large
"languages" the JOIN should be faster since it only generates the
subselect result set once.

-- 
Mike Rylander
mrylander@xxxxxxxxx
GPLS -- PINES Development
Database Developer
http://open-ils.org

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)