Hi Scott,
Thanks for clustering suggestion. I have never used
cluster command and I need to read up before I can
use it.
I have adjusted postgres parameters per your recommen-
dation. Work_mem is now 8GB, checkpoint_segments is
raised to 100. Shared_buffers is still at 1GB because
this is bound by SHMMAX of kernal, which I cannot alter
for maybe a few days. I am thinking of setting SHMMAX
to 8GB. I may or may not be able to move up to 8.3.6
over this weekend. If not, sometime during the week.
With all that, hopefully, I can achieve better perfor-
mance. More updates to come in a day or two.
Many thanks for your help.
Regards,
Tena Sakai
tsakai@xxxxxxxxxxxxxx
-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@xxxxxxxxx]
Sent: Sat 2/21/2009 12:57 AM
To: Uwe C. Schroeder
Cc: pgsql-admin@xxxxxxxxxxxxxx; Tena Sakai
Subject: Re: very, very slow performance
On Fri, Feb 20, 2009 at 10:45 PM, Uwe C. Schroeder <uwe@xxxxxxxxx> wrote:
>
> On Friday 20 February 2009, Tena Sakai wrote:
>> Hi Scott,
>>
>> > What does explain and (it'll take a while to get
>> > it) explain analyze select ... have to say?
>>
>> ---------------------------------------------------------------------------
>>---------- Hash Join (cost=165264.65..55486119.31 rows=601095277 width=32)
>> Hash Cond: (genotype.allele1id = a1.alleleid)
>> -> Hash Join (cost=82632.33..34731274.54 rows=601095277 width=34)
>> Hash Cond: (genotype.allele2id = a2.alleleid)
>> -> Seq Scan on genotype (cost=0.00..13976429.77 rows=601095277
>> width=36) -> Hash (cost=42474.59..42474.59 rows=2447659 width=6) -> Seq
>> Scan on allele a2 (cost=0.00..42474.59 rows=2447659 width=6) -> Hash
>> (cost=42474.59..42474.59 rows=2447659 width=6)
>> -> Seq Scan on allele a1 (cost=0.00..42474.59 rows=2447659
>> width=6) (9 rows)
I was wrong about this query, it is constrained by the where clause.
I much prefer join on syntax as it's more obvious what's joining to
what. Pgsql is smart enough to reorder join clauses as long as it's
not contrained in by say, a left join, and even then there's some
wiggle room I think. Anyway...
> The above tells you that you don't have indices in place. Postgres chooses a
> seq scan - which as the name implies scans all the rows in sequencial order.
Well, he's grabbing everything, so I'm betting an index won't buy you
anything unless everything fits in memory and you set random_page_cost
low enough and shared_buffers and effective_cache high enough, then an
index will lose.
However, if you always access the tables in a given order, you can
cluster tables and get really fast results. I'd try clustering on an
index for each sub table, clustering on that, and adding order bys to
put the result sets into matching clustered index fields for each
joined table.