Re: Query tuning

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Wed, 19 Aug 2009 10:17:26 -0700

On 8/19/09 9:28 AM, "Kevin Kempter" <kevink@xxxxxxxxxxxxxxxxxxx> wrote:

> Hi all;
> 
> we've been fighting this query for a few days now. we bumped up the statistict
> target for the a.id , c.url_hits_id and the b.id columns below to 250 and ran
> an analyze on the relevant tables.  we killed it after 8hrs.
> 
> Note the url_hits table has > 1.4billion rows
> 
> Any suggestions?
> 

Have you tried setting work_mem higher for just this query?

The big estimated cost is the sequential scan on url_hits.  But in reality,
if the estimates are off the sort and index scan at the end might be your
bottleneck.  Larger work_mem might make it choose another plan there.

But if the true cost is the sequential scan on url_hits, then only an index
there will help.

> 
> 
> $ psql -ef expl.sql pwreport
> explain          
> select           
> a.id,            
> ident_id,        
> time,            
> customer_name,   
> extract('day' from timezone(e.name, to_timestamp(a.time))) as day,
> category_id      
> from             
> pwreport.url_hits a left outer join
> pwreport.url_hits_category_jt c on (a.id = c.url_hits_id),
> pwreport.ident b,
> pwreport.timezone e
> where            
> a.ident_id = b.id
> and b.timezone_id = e.id
> and time >= extract ('epoch' from timestamp '2009-08-12')
> and time < extract ('epoch' from timestamp '2009-08-13' )
> and direction = 'REQUEST'
> ;
>                  
> QUERY
> PLAN             
> ------------------------------------------------------------------------------
> ------------------------------------------------------------------------------
> --------------------------------------------------------
>  Merge Right Join  (cost=47528508.61..180424544.59 rows=10409251 width=53)
>    Merge Cond: (c.url_hits_id = a.id)
>    ->  Index Scan using mt_url_hits_category_jt_url_hits_id_index on
> url_hits_category_jt c  (cost=0.00..122162596.63 rows=4189283233 width=8)
>    ->  Sort  (cost=47528508.61..47536931.63 rows=3369210 width=49)
>          Sort Key: a.id
>          ->  Hash Join  (cost=2565.00..47163219.21 rows=3369210 width=49)
>                Hash Cond: (b.timezone_id = e.id)
>                ->  Hash Join  (cost=2553.49..47116881.07 rows=3369210
> width=37)
>                      Hash Cond: (a.ident_id = b.id)
>                      ->  Seq Scan on url_hits a  (cost=0.00..47051154.89
> rows=3369210 width=12)
>                            Filter: ((direction =
> 'REQUEST'::proxy_direction_enum) AND (("time")::double precision >=
> 1250035200::double precision) AND (("time")::double precision <
> 1250121600::double precision))
>                      ->  Hash  (cost=2020.44..2020.44 rows=42644 width=29)
>                            ->  Seq Scan on ident b  (cost=0.00..2020.44
> rows=42644 width=29)
>                ->  Hash  (cost=6.78..6.78 rows=378 width=20)
>                      ->  Seq Scan on timezone e  (cost=0.00..6.78 rows=378
> width=20)
> (15 rows)
> 
> 
> --
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance