Re: Re-Reason of Slowness of Query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank U all, for U'r Nice Support.

Let me Conclude the results, below results are obtained after finding the needed queries :

First Option :

pdc_uima=# explain analyze select distinct(p.crawled_page_id)
pdc_uima-# from page_content p left join clause2 c on (p.crawled_page_id =
pdc_uima(# c.source_id) where (c.source_id is null);
                                                                     QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual time=87927.000..87930.084 rows=72 loops=1)
   ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.191..87926.546 rows=74 loops=1)
         ->  Seq Scan on page_content p  (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.027..528.978 rows=428467 loops=1)
         ->  Index Scan using idx_clause2_source_id on clause2 c  (cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1 loops=428467)
               Index Cond: (p.crawled_page_id = c.source_id)
 Total runtime: 87933.882 ms :-(
(6 rows)

Second Option :

pdc_uima=# explain analyze select distinct(p.crawled_page_id) from page_content p
pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where c.source_id = p.crawled_page_id);
                                                                     QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual time=7047.259..7050.261 rows=72 loops=1)
   ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.039..7046.826 rows=74 loops=1)
         ->  Seq Scan on page_content p  (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.008..388.976 rows=428467 loops=1)
         ->  Index Scan using idx_clause2_source_id on clause2 c  (cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1 loops=428467)
               Index Cond: (c.source_id = p.crawled_page_id)
 Total runtime: 7054.074 ms :-)
(6 rows)


Thanks & best Regards,
Adarsh Sharma




Chetan Suttraway wrote:


On Wed, Mar 23, 2011 at 4:08 PM, <tv@xxxxxxxx> wrote:
> I just want to retrieve that id 's from page_content which do not have
> any entry in clause2 table.

In that case the query probably does not work (at least the query you've
sent in the first post) as it will return even those IDs that have at
least one other row in 'clause2' (not matching the != condition). At least
that's how I understand it.

true.
 
So instead of this

select distinct(p.crawled_page_id)
from page_content p, clause2 c where p.crawled_page_id != c.source_id ;

you should probably do this

select distinct(p.crawled_page_id)
from page_content p left join clause2 c on (p.crawled_page_id =
c.source_id) where (c.source_id is null);

I guess this will be much more efficient too.


This looks like to give expected results. Also note that the where clause "is null" is really required and is not an
optional predicate.

 
regards
Tomas


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.





[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux