On 07/06/2012 03:34 PM, Perry Smith wrote:
Hi Guys,
This isn't a PostgreSQL specific question but just a SQL question. If this is not an appropriate question for this list, please let me know.
It is also, perhaps, a really silly question.
This query (without the 'explain' keyword) , when executed takes forever and a day:
condor_development=> explain select id from filesets where id not in ( select fileset_id from service_pack_fileset_maps );
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on filesets (cost=0.00..71937742.00 rows=26088 width=4)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..2517.78 rows=95852 width=4)
-> Seq Scan on service_pack_fileset_maps (cost=0.00..1663.52 rows=95852 width=4)
(5 rows)
This query returns within a second:
condor_development=> explain select id from filesets where id not in ( select distinct fileset_id from service_pack_fileset_maps );
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on filesets (cost=2102.31..3153.53 rows=26088 width=4)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> HashAggregate (cost=1903.15..2062.48 rows=15933 width=4)
-> Seq Scan on service_pack_fileset_maps (cost=0.00..1663.52 rows=95852 width=4)
(5 rows)
The difference is the "distinct" keyword in the inner select.
What I'm confused about is why isn't the "distinct" implicit? I thought the construct "blah in ( select ... )" was using "sets" and an item (I thought) can not be in a set more than once.
Perhaps my question is the opposite really? Why would you not always use "distinct" in the inner select when the operator is "in" or "not in" ?
And if I can throw in another question on top: is there a different method other than "not in" that would work better?
Thank you guys for the help and a really awesome database.
pedz
(this is *suppose* to be sent as plain text... I hope my mailer does what it is told)
Well they are distinct records, they just may have the same values. And
I'm not trying to be flippant. We don't see the structure of those
table: are all the id fields involved primary keys or with unique index
coverage? Does "not exists ( select fileset.id = fileset_id from
service_pack_fileset_map) change the behaviour?
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general