Re: Inner join vs where-clause subquery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here's the query and explain analyze using the result of the sub-query
substituted: 

QUERY
explain analyze select min(nlogid) as start_nlogid,
       max(nlogid) as end_nlogid,
       min(dtCreateDate) as start_transaction_timestamp,
       max(dtCreateDate) as end_transaction_timestamp
from activity_log_facts
where nlogid > 478287801
and dtCreateDate < '2006-12-18 9:10'

EXPLAIN ANALYZE
Aggregate  (cost=657.37..657.38 rows=1 width=12) (actual
time=0.018..0.019 rows=1 loops=1)
  ->  Index Scan using activity_log_facts_nlogid_idx on
  activity_log_facts  (cost=0.00..652.64 rows=472 width=12) (actual
  time=0.014..0.014 rows=0 loops=1)
        Index Cond: (nlogid > 478287801)
        Filter: (dtcreatedate < '2006-12-18 09:10:00'::timestamp without
        time zone)
Total runtime: 0.076 ms


Sorry if the reason should be obvious, but I'm not the best at
interpreting the explains.  Why is this explain so much simpler than the
other query plan (with the subquery)?



On Tue, 19 Dec 2006 18:23:06 +0000, "Richard Huxton" <dev@xxxxxxxxxxxx>
said:
> Jeremy Haile wrote:
> > Here is the explain analyze output:
> 
> Well, the row estimates are about as far out as you can get:
> 
> >           ->  Index Scan using activity_log_facts_pkey on
> >           activity_log_facts  (cost=0.00..1831613.82 rows=1539298
> >           width=12) (actual time=0.050..0.050 rows=0 loops=1)
> 
> >           ->  Index Scan Backward using activity_log_facts_pkey on
> >           activity_log_facts  (cost=0.00..1831613.82 rows=1539298
> >           width=12) (actual time=0.004..0.004 rows=0 loops=1)
> 
> >           ->  Index Scan using activity_log_facts_dtcreatedate_idx on
> >           activity_log_facts  (cost=0.00..5406927.50 rows=1539298
> >           width=12) (actual time=100221.953..100221.953 rows=0 loops=1)
> 
> >           ->  Index Scan Backward using
> >           activity_log_facts_dtcreatedate_idx on activity_log_facts 
> >           (cost=0.00..5406927.50 rows=1539298 width=12) (actual
> >           time=56367.364..56367.364 rows=0 loops=1)
> 
> Hmm - it's using the indexes on dtCreateDate and nlogid which seems 
> broadly sensible, and then plans to limit the results for min()/max(). 
> However, it's clearly wrong about how many rows will satisfy
>   nlogid > (select max(a.end_nlogid) from activity_log_import_history a)
> 
> >>> select min(nlogid) as start_nlogid,
> >>>        max(nlogid) as end_nlogid,
> >>>        min(dtCreateDate) as start_transaction_timestamp,
> >>>        max(dtCreateDate) as end_transaction_timestamp
> >>> from activity_log_facts 
> >>> where nlogid > ( select max(a.end_nlogid) from
> >>> activity_log_import_history a)
> >>> and dtCreateDate < '2006-12-18 9:10'
> 
> If you run explain on the other forms of your query, I'd guess it's much 
> more accurate. There's a simple way to see if that is the issue. Run the 
> sub-query and substitute the actual value returned into the query above. 
> Then, try the same but with a prepared query. If it's down to nlogid 
> estimates then the first should be fast and the second slow.
> 
> -- 
>    Richard Huxton
>    Archonet Ltd


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux