Re: distinct not working in a multiple join

David Rio Deiros <driodeiros@xxxxxxxxx> · Thu, 9 Feb 2006 18:21:51 -0800

On Thu, Feb 09, 2006 at 07:20:19PM -0500, Tom Lane wrote:
> David Rio Deiros <driodeiros@xxxxxxxxx> writes:
> > I have some issues with the query attached at the end of this email.
> > If I run that query I got this output ( I have removed some of the 
> > fields) despite the distinct clause:
> 
> > QC Q&A  | www.xxx.com | 44281
> > QC Q&A  | www.xxx.com | 44281
> > WhyMAX? |             | 44285
> 
> Since you removed some fields, no one can tell if this output
> is wrong or not.

Tom, 

Thanks for the answer and apologizes for the lack of information. 
I think you found the problem already (see bellow). In anycase, just 
to clarify:

This is the information about the machine/OS/Psql version:

1. Psql 8.0.4 - Linux 2.6.13

Now, This is the first query I tried:

            SELECT
                distinct ar.title,
                ar.raw_data,
                ar.upload_dt,
                ar.artifact_id,
                g.group_id,
                acl.read
            FROM
                artifact_acl acl,
                artifacts a,
                artifact_revisions ar,
                revisions_to_types rt,
                artifact_types at,
                groups g
            WHERE
                a.expire_dt > NOW() and
                acl.artifact_id = a.artifact_id and
                a.published_revision = ar.revision_id and
                ar.revision_id = rt.revision_id and
                rt.type_id = at.type_id and
                acl.group_id = g.group_id and
                a.suppress = false and
                at.is_resource = true and
                (
                acl.group_id = (
                                select
                                    group_id
                                from
                                    groups
                                where
                                    group_nm = 'Marketing'
                               )
                               OR
                acl.group_id = (
                                select
                                    group_id
                                from
                                    groups
                                where
                                    group_nm = 'Communicators'
                               )
                ) and
                acl.group_id IN (4,17,54,2,1,123) and
                acl.read = true and
                ((g.back_or_front = 'front') or
                (g.group_nm = 'PR Admin'))
                ORDER BY
                    ar.upload_dt DESC
                LIMIT 3;

and here you have the output:

-[ RECORD 1 ]---------------------------
title       | QC Q&A
raw_data    | www.com
upload_dt   | 2006-02-09 11:15:04.724525
artifact_id | 44281
group_id    | 2
read        | t
-[ RECORD 2 ]---------------------------
title       | QC Q&A
raw_data    | www.com
upload_dt   | 2006-02-09 11:15:04.724525
artifact_id | 44281
group_id    | 54
read        | t
-[ RECORD 3 ]---------------------------
title       | WhyMAX?
raw_data    | 
upload_dt   | 2006-02-09 09:25:27.717663
artifact_id | 44061
group_id    | 2
read        | t

Now, I tried this query, which is the first one but removing the g.group_id
from the select clause. This is the output:

-[ RECORD 1 ]-----------------------------
title       | QC Q&A
raw_data    | www.com
upload_dt   | 2006-02-09 11:15:04.724525
artifact_id | 44281
read        | t
-[ RECORD 2 ]----------------------------
title       | WhyMAX?
raw_data    | 
upload_dt   | 2006-02-09 09:25:27.717663
artifact_id | 44061
read        | t
-[ RECORD 3 ]---------------------------
title       | Business Assets-test
raw_data    | corpcomm.com/
upload_dt   | 2006-02-08 15:58:06.81578
artifact_id | 44280
read        | t

Which is the desired output.

> >             SELECT          
> >                 distinct ar.title,
> >                 ar.raw_data,
> >                 ar.upload_dt,
> >                 ar.artifact_id,
> >                 g.group_id,
> >                 acl.read
> >             FROM
> 
> The way you formatted that makes me wonder if you think that the
> DISTINCT applies only to the first column.  It does not, it applies
> to all the columns together --- that is, it only removes rows that
> are identical in all columns to some other row.  So if there were
> some rows that were identical except for group_id, you'd get the
> behavior you described.

Yes, that was the reason, I thought that distinct was only applying
to the first column.

> BTW, I'm not sure I believe this is actually the same query you
> ran.  The presence of the "GROUP BY ar.title" clause should have
> provoked errors about ungrouped columns.  If this is an exact
> copy of what you did, what Postgres version is this?

It wasn't the same query. Forget about the first email, in this email
you have the queries I launched (sorry again about that).

Now I have to redefine my query because I want to get the second 
output but keeping the group_id. Ideas and suggestions are welcome.

Thanks again your help,

David