Re: best way to query

Steve Clark <sclark@xxxxxxxxxxxxx> · Fri, 25 Jan 2008 12:15:43 -0500

Daniel Verite wrote:
	Steve Clark wrote:

any way i have 2 table - A and B.
each table has a key field and if a row is in B it should have a 
corresponding row in A - but theres
the problem it doesn't for all the rows in B.

So I want to do something like
delete from B where key not in (select key from A order by key);

The problem is there are about 1,000,000 rows in A and 300,000 rows 

in 

B. I let the above run
all night and it was still running the next morning. Does anyone have 

an idea of a better way.

An outer join is sometimes spectacularly more efficient for this 
particular kind of query.

I'd suggest you try:

delete from B where key in 
 (select B.key from B left outer join A on A.key=B.key
   where A.key is null)

WOW!

this runs in about 10 seconds - thanks Daniel.

 explain select count(*) from t_event_ack_log  where event_log_no in 
(select t_event_ack_log.event_log_no from t_event_ack_log left outer 
join t_unit_event_log a on a.event_log_no=t_event_ack_log.event_log_no 
 where a.event_log_no is null);
                                                   QUERY PLAN 

----------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=128349.56..128349.57 rows=1 width=0)
   ->  Hash Join  (cost=94512.91..126935.36 rows=565681 width=0)
         Hash Cond: (public.t_event_ack_log.event_log_no = 
public.t_event_ack_log.event_log_no)
         ->  Seq Scan on t_event_ack_log  (cost=0.00..14759.85 
rows=565685 width=4)
         ->  Hash  (cost=92609.85..92609.85 rows=152245 width=4)
               ->  HashAggregate  (cost=91087.40..92609.85 
rows=152245 width=4)
                     ->  Hash Left Join  (cost=57337.95..90380.29 
rows=282842 width=4)
                           Hash Cond: 
(public.t_event_ack_log.event_log_no = a.event_log_no)
                           Filter: (a.event_log_no IS NULL)
                           ->  Seq Scan on t_event_ack_log 
(cost=0.00..14759.85 rows=565685 width=4)
                           ->  Hash  (cost=40696.09..40696.09 
rows=1014309 width=4)
                                 ->  Seq Scan on t_unit_event_log a 
(cost=0.00..40696.09 rows=1014309 width=4)

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq