Re: Re-Reason of Slowness of Query

Adarsh Sharma <adarsh.sharma@xxxxxxxxxx> · Wed, 23 Mar 2011 16:51:17 +0530

Thank U all, for U'r Nice Support.

Let me Conclude the results, below results are obtained after finding
the needed queries :

First Option :

pdc_uima=# explain analyze select distinct(p.crawled_page_id)

pdc_uima-# from page_content p left join clause2 c on
(p.crawled_page_id =

pdc_uima(# c.source_id) where (c.source_id is null);

QUERY
PLAN                                                                     

-----------------------------------------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual
time=87927.000..87930.084 rows=72 loops=1)

   ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079
width=8) (actual time=0.191..87926.546 rows=74 loops=1)

         ->  Seq Scan on page_content p  (cost=0.00..87132.17
rows=428817 width=8) (actual time=0.027..528.978 rows=428467 loops=1)

         ->  Index Scan using idx_clause2_source_id on clause2 c 
(cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1
loops=428467)

               Index Cond: (p.crawled_page_id = c.source_id)

 Total runtime: 87933.882 ms :-( 

(6 rows)

Second Option :

pdc_uima=# explain analyze select distinct(p.crawled_page_id) from
page_content p

pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where
c.source_id = p.crawled_page_id);

QUERY
PLAN                                                                     

-----------------------------------------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual
time=7047.259..7050.261 rows=72 loops=1)

   ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079
width=8) (actual time=0.039..7046.826 rows=74 loops=1)

         ->  Seq Scan on page_content p  (cost=0.00..87132.17
rows=428817 width=8) (actual time=0.008..388.976 rows=428467 loops=1)

         ->  Index Scan using idx_clause2_source_id on clause2 c 
(cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1
loops=428467)

               Index Cond: (c.source_id = p.crawled_page_id)

 Total runtime: 7054.074 ms  :-) 

(6 rows)

Thanks & best Regards,

Adarsh Sharma

Chetan Suttraway wrote:

  On Wed, Mar 23, 2011 at 4:08 PM, <tv@xxxxxxxx>
wrote:

    > I just want to retrieve that id 's from
page_content which do not have

> any entry in clause2 table.

In that case the query probably does not work (at least the query you've

sent in the first post) as it will return even those IDs that have at

least one other row in 'clause2' (not matching the != condition). At
least

that's how I understand it.

  true.

  So
instead of this

select distinct(p.crawled_page_id)

from page_content p, clause2 c where p.crawled_page_id != c.source_id ;

you should probably do this

select distinct(p.crawled_page_id)

from page_content p left join clause2 c on (p.crawled_page_id =

c.source_id) where (c.source_id is null);

I guess this will be much more efficient too.

This looks like to give expected results. Also note that the where
clause "is null" is really required and is not an

optional predicate.

  regards

Tomas

--

Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)

To make changes to your subscription:

    http://www.postgresql.org/mailpref/pgsql-performance

-- 

Regards,

Chetan Suttraway

  EnterpriseDB, The Enterprise
PostgreSQL company.