Re: Recheck condition

"Josh Harrison" <joshques@xxxxxxxxx> · Fri, 30 Nov 2007 09:37:32 -0500

> > > *Query1*
> > > SELECT person_id  FROM person   WHERE (column1=1 AND column2='62')

> > > INTERSECT
> > > SELECT person_id  FROM person  WHERE (column1=1 AND column2='189')

> I get the same plan(see below)  with 'sort'  for 'intersect all' operation

> too. Why is intersect not an effecient way? Is there any other way this
> query/index can be written/created so that I can get the intersect results
> in an efficient way?

Set operations are rather inefficient. To find the intersection of two

arbitrary sets you need to sort them and compare. A query like you
write would be better expressed as a join, something like:

SELECT a.person_id
FROM (SELECT person_id  FROM person   WHERE (column1=1 AND column2='62') a,

     (SELECT person_id  FROM person  WHERE (column1=1 AND column2='189') b
WHERE a.person_id = b.person_id;

or perhaps:

SELECT a.person_id
FROM person a, person b
WHERE a.column1=1 AND a.column2='62
'
AND b.column1=1 AND b.column2='189'
AND a.person_id = b.person_id;

Which will probably generate a merge join...

Thanks. But this query seems to be more expensive than using intersect operator.

This is the explain analyse plan for this query. It took 5 1/2 minutes to generate this. I also tried to disable the mergejoin and in that case it uses hash join and still takes more than 3 minutes (intersect took only 40 sec)

QUERY PLAN                                                                                                                                                                    
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

 Merge Join  (cost=1610648.92..10280119.99 rows=577856095 width=4) (actual time=30562.630..264534.677 rows=225145385 loops=1)                                                  
   Merge Cond: (a.patient_id = b.patient_id
)                                                                                                                                   
   ->  Sort  (cost=527974.81..528929.79 rows=381993 width=4) (actual time=3755.361..3845.134
 rows=213435 loops=1)                                                              
         Sort Key: a.patient_id                                                                                                                                                

         Sort Method:  quicksort  Memory: 15868kB                                                                                                                              
         ->  Bitmap Heap Scan on clinical_variable2 a  (cost=
17886.42..492557.97 rows=381993 width=4) (actual time=315.753..3410.366 rows=327498 loops=1)                      
               Recheck Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '62'::text))                                                              

               ->  Bitmap Index Scan on clinical_variable_idx_topserv  (cost=0.00..17790.92 rows=381993 width=0) (actual time=277.185..277.185 rows=327498 loops=1)            
                     Index Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '62'::text))                                                          

   ->  Materialize  (cost=1082674.11..1124546.38 rows=3349781 width=4) (actual time=26807.248..99885.620 rows=225148250 loops=1)                                               
         ->  Sort  (cost=1082674.11..1091048.57
 rows=3349781 width=4) (actual time=26807.238..30343.870 rows=3429228 loops=1)                                                  
               Sort Key: b.patient_id                                                                                                                                          

               Sort Method:  external merge  Disk: 53552kB                                                                                                                     
               ->  Bitmap Heap Scan on clinical_variable2 b  (cost=
156754.24..679555.96 rows=3349781 width=4) (actual time=2744.126..20106.160 rows=3429228 loops=1)           
                     Recheck Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '189'::text))                                                       

                     ->  Bitmap Index Scan on clinical_variable_idx_topserv  (cost=0.00..155916.80 rows=3349781 width=0) (actual time=2686.456..2686.456 rows=3429228 loops=1) 
                           Index Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '189'::text))                                                   

 Total runtime: 324646.035 ms                                                                                                                                                  

 18 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms] a: 0/ms]

Is there any other way you can think of to solve this problem. May be creating the indexes in a  different way or something?

Thanks
jo