Re: Re-Reason of Slowness of Query

Chetan Suttraway <chetan.suttraway@xxxxxxxxxxxxxxxx> · Wed, 23 Mar 2011 16:54:10 +0530

On Wed, Mar 23, 2011 at 4:51 PM, Vitalii Tymchyshyn <tivv00@xxxxxxxxx> wrote:

    23.03.11 13:21, Adarsh Sharma ÐÐÐÐÑÐÐ(ÐÐ):

      Thank U all, for U'r Nice Support.

      Let me Conclude the results, below results are obtained after
      finding
      the needed queries :

      First Option :

      pdc_uima=# explain analyze select distinct(p.crawled_page_id)

      pdc_uima-# from page_content p left join clause2 c on
      (p.crawled_page_id =

      pdc_uima(# c.source_id) where (c.source_id is null);

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
      QUERY
PLANÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ

-----------------------------------------------------------------------------------------------------------------------------------------------------

      ÂHashAggregateÂ (cost=100278.16..104104.75 rows=382659 width=8)
      (actual
      time=87927.000..87930.084 rows=72 loops=1)

      ÂÂ ->Â Nested Loop Anti JoinÂ (cost=0.00..99320.46 rows=383079
      width=8) (actual time=0.191..87926.546 rows=74 loops=1)

      ÂÂÂÂÂÂÂÂ ->Â Seq Scan on page_content pÂ (cost=0.00..87132.17
      rows=428817 width=8) (actual time=0.027..528.978 rows=428467
      loops=1)

      ÂÂÂÂÂÂÂÂ ->Â Index Scan using idx_clause2_source_id on clause2
      cÂ
      (cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202
      rows=1
      loops=428467)

      ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Index Cond: (p.crawled_page_id = c.source_id)

      ÂTotal runtime: 87933.882 ms :-(

      (6 rows)

      Second Option :

      pdc_uima=# explain analyze select distinct(p.crawled_page_id)
      from
      page_content p

      pdc_uima-#Â where NOT EXISTS (select 1 fromÂ clause2 c where
      c.source_id = p.crawled_page_id);

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
      QUERY
PLANÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ

-----------------------------------------------------------------------------------------------------------------------------------------------------

      ÂHashAggregateÂ (cost=100278.16..104104.75 rows=382659 width=8)
      (actual
      time=7047.259..7050.261 rows=72 loops=1)

      ÂÂ ->Â Nested Loop Anti JoinÂ (cost=0.00..99320.46 rows=383079
      width=8) (actual time=0.039..7046.826 rows=74 loops=1)

      ÂÂÂÂÂÂÂÂ ->Â Seq Scan on page_content pÂ (cost=0.00..87132.17
      rows=428817 width=8) (actual time=0.008..388.976 rows=428467
      loops=1)

      ÂÂÂÂÂÂÂÂ ->Â Index Scan using idx_clause2_source_id on clause2
      cÂ
      (cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013
      rows=1
      loops=428467)

      ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Index Cond: (c.source_id = p.crawled_page_id)

      ÂTotal runtime: 7054.074 ms 
          :-) 

      (6 rows)

    Actually the plans are equal, so I suppose it depends on what were
    run first :). Slow query operates with data mostly on disk, while
    fast one with data in memory. 

yeah. maybe the easiest way, is to start a fresh session and fire the queries.
Â

    Best regards, Vitalii Tymchyshyn

-- 
Regards,
Chetan Suttraway
EnterpriseDB, TheÂEnterprise PostgreSQLÂcompany.