On 8/19/09 9:28 AM, "Kevin Kempter" <kevink@xxxxxxxxxxxxxxxxxxx> wrote: > Hi all; > > we've been fighting this query for a few days now. we bumped up the statistict > target for the a.id , c.url_hits_id and the b.id columns below to 250 and ran > an analyze on the relevant tables. we killed it after 8hrs. > > Note the url_hits table has > 1.4billion rows > > Any suggestions? > Have you tried setting work_mem higher for just this query? The big estimated cost is the sequential scan on url_hits. But in reality, if the estimates are off the sort and index scan at the end might be your bottleneck. Larger work_mem might make it choose another plan there. But if the true cost is the sequential scan on url_hits, then only an index there will help. > > > $ psql -ef expl.sql pwreport > explain > select > a.id, > ident_id, > time, > customer_name, > extract('day' from timezone(e.name, to_timestamp(a.time))) as day, > category_id > from > pwreport.url_hits a left outer join > pwreport.url_hits_category_jt c on (a.id = c.url_hits_id), > pwreport.ident b, > pwreport.timezone e > where > a.ident_id = b.id > and b.timezone_id = e.id > and time >= extract ('epoch' from timestamp '2009-08-12') > and time < extract ('epoch' from timestamp '2009-08-13' ) > and direction = 'REQUEST' > ; > > QUERY > PLAN > ------------------------------------------------------------------------------ > ------------------------------------------------------------------------------ > -------------------------------------------------------- > Merge Right Join (cost=47528508.61..180424544.59 rows=10409251 width=53) > Merge Cond: (c.url_hits_id = a.id) > -> Index Scan using mt_url_hits_category_jt_url_hits_id_index on > url_hits_category_jt c (cost=0.00..122162596.63 rows=4189283233 width=8) > -> Sort (cost=47528508.61..47536931.63 rows=3369210 width=49) > Sort Key: a.id > -> Hash Join (cost=2565.00..47163219.21 rows=3369210 width=49) > Hash Cond: (b.timezone_id = e.id) > -> Hash Join (cost=2553.49..47116881.07 rows=3369210 > width=37) > Hash Cond: (a.ident_id = b.id) > -> Seq Scan on url_hits a (cost=0.00..47051154.89 > rows=3369210 width=12) > Filter: ((direction = > 'REQUEST'::proxy_direction_enum) AND (("time")::double precision >= > 1250035200::double precision) AND (("time")::double precision < > 1250121600::double precision)) > -> Hash (cost=2020.44..2020.44 rows=42644 width=29) > -> Seq Scan on ident b (cost=0.00..2020.44 > rows=42644 width=29) > -> Hash (cost=6.78..6.78 rows=378 width=20) > -> Seq Scan on timezone e (cost=0.00..6.78 rows=378 > width=20) > (15 rows) > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance