> > > *Query1*
> > > SELECT person_id FROM person WHERE (column1=1 AND column2='62')
> > > INTERSECT
> > > SELECT person_id FROM person WHERE (column1=1 AND column2='189')> I get the same plan(see below) with 'sort' for 'intersect all' operationSet operations are rather inefficient. To find the intersection of two
> too. Why is intersect not an effecient way? Is there any other way this
> query/index can be written/created so that I can get the intersect results
> in an efficient way?
arbitrary sets you need to sort them and compare. A query like you
write would be better expressed as a join, something like:
SELECT a.person_id
FROM (SELECT person_id FROM person WHERE (column1=1 AND column2='62') a,
(SELECT person_id FROM person WHERE (column1=1 AND column2='189') b
WHERE a.person_id = b.person_id;
or perhaps:
SELECT a.person_id
FROM person a, person b
WHERE a.column1=1 AND a.column2='62 '
AND b.column1=1 AND b.column2='189'
AND a.person_id = b.person_id;
Which will probably generate a merge join...
Thanks. But this query seems to be more expensive than using intersect operator.
This is the explain analyse plan for this query. It took 5 1/2 minutes to generate this. I also tried to disable the mergejoin and in that case it uses hash join and still takes more than 3 minutes (intersect took only 40 sec)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=1610648.92..10280119.99 rows=577856095 width=4) (actual time=30562.630..264534.677 rows=225145385 loops=1)
Merge Cond: (a.patient_id = b.patient_id )
-> Sort (cost=527974.81..528929.79 rows=381993 width=4) (actual time=3755.361..3845.134 rows=213435 loops=1)
Sort Key: a.patient_id
Sort Method: quicksort Memory: 15868kB
-> Bitmap Heap Scan on clinical_variable2 a (cost= 17886.42..492557.97 rows=381993 width=4) (actual time=315.753..3410.366 rows=327498 loops=1)
Recheck Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '62'::text))
-> Bitmap Index Scan on clinical_variable_idx_topserv (cost=0.00..17790.92 rows=381993 width=0) (actual time=277.185..277.185 rows=327498 loops=1)
Index Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '62'::text))
-> Materialize (cost=1082674.11..1124546.38 rows=3349781 width=4) (actual time=26807.248..99885.620 rows=225148250 loops=1)
-> Sort (cost=1082674.11..1091048.57 rows=3349781 width=4) (actual time=26807.238..30343.870 rows=3429228 loops=1)
Sort Key: b.patient_id
Sort Method: external merge Disk: 53552kB
-> Bitmap Heap Scan on clinical_variable2 b (cost= 156754.24..679555.96 rows=3349781 width=4) (actual time=2744.126..20106.160 rows=3429228 loops=1)
Recheck Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '189'::text))
-> Bitmap Index Scan on clinical_variable_idx_topserv (cost=0.00..155916.80 rows=3349781 width=0) (actual time=2686.456..2686.456 rows=3429228 loops=1)
Index Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '189'::text))
Total runtime: 324646.035 ms
18 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms] a: 0/ms]
Is there any other way you can think of to solve this problem. May be creating the indexes in a different way or something?
Thanks
jo