The behaviour is different in postgres 8.1.9 (much faster) (the table has 9 million rows instead of 25 million..but the query comes back very fast (8 seconds).. Wonder if this is very specific to 8.4.0 On Fri, Jun 4, 2010 at 11:01 AM, Anj Adu <fotographs@xxxxxxxxx> wrote: > Does the difference in expected and actual rows as seen by the planner > a big factor? Even after an analyze...the results are similar. (there > is a big diff between expected and actual) > Partition has 25 million rows > > On Fri, Jun 4, 2010 at 10:41 AM, Anj Adu <fotographs@xxxxxxxxx> wrote: >> 2010/6/4 <tv@xxxxxxxx>: >>>> I am reposting as my original query was mangled >>>> >>>> The link to the explain plan is here as it does not paste well into >>>> the email body. >>>> >>>> http://explain.depesz.com/s/kHa >>>> >>>> >>>> The machine is a 2 cpu quad core 5430 with 32G RAM and 6x450G 15K >>>> single raid-10 array >>>> >>>> 1G work_mem >>>> default_statistics_target=1000 >>>> random_page_cost=1 >>> >>> Are you sure it's wise to set the work_mem to 1G? Do you really need it? >>> Don't forget this is not a 'total' or 'per query' - each query may >>> allocate multiple work areas (and occupy multiple GB). But I guess this >>> does not cause the original problem. >>> >>> The last row 'random_page_cost=1' - this basically says that reading data >>> by random is just as cheap as reading data sequentially. Which may result >>> in poor performance due to bad plans. Why have you set this value? >>> >>> Sure, there are rare cases where 'random_page_cost=1' is OK. >> >> The default for 8.4 is 2 >> I tried with 2 and 1..but the results are not very different. I >> understand that for fast disks (which we have with a decent Raid 10 >> setup)..the random_page_cost can be lowered as needed..but I guess it >> did not make a difference here. >> >> >>> >>>> >>>> I am curious why the hash join takes so long. The main table >>>> dev4_act_dy_fact_2010_05_t has 25 million rows. The table is >>>> partitioned into 3 parts per month. Remaining tables are very small ( >>>> < 1000 rows) >>> >>> Well, the real cause that makes your query slow is the 'index scan' part. >>> >>> Index Scan using dev4_act_dy_fact_2010_05_t3_thedate on >>> dev4_act_dy_fact_2010_05_t3 a (cost=0.00..94041.89 rows=204276 width=60) >>> (actual time=164533.725..164533.725 rows=0 loops=1) >>> >>> The first thing to note here is the difference in expected and actual >>> number of rows - the planner expects 204276 but gets 0 rows. How large is >>> this partition? >> >> The partition has 25 million rows with indexes on theDate, node_id.. >> I altered the random_page_cost to 4 (1 more than the default)..still >> slow. These tables are analyzed every day >> I have an index on each field used in the where criteria, >>> >>> Try to analyze it, set the random_page_cost to something reasonable (e.g. >>> 4) and try to run the query again. >>> >>> Tomas >>> >>> >> > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance