Hi Uwe,
I can certainly add indices and run analyze on each
table and look at the performance. I am going to
upgrade to 8.3.6 (I am at 8.3.3) and then add indices.
It might take a few days before I can update you with
outcome, but I will keep you posted.
Many thanks.
Regards,
Tena Sakai
tsakai@xxxxxxxxxxxxxx
-----Original Message-----
From: Uwe C. Schroeder [mailto:uwe@xxxxxxxxx]
Sent: Fri 2/20/2009 9:45 PM
To: pgsql-admin@xxxxxxxxxxxxxx
Cc: Tena Sakai; Scott Marlowe
Subject: Re: very, very slow performance
On Friday 20 February 2009, Tena Sakai wrote:
> Hi Scott,
>
> > What does explain and (it'll take a while to get
> > it) explain analyze select ... have to say?
>
> ---------------------------------------------------------------------------
>---------- Hash Join (cost=165264.65..55486119.31 rows=601095277 width=32)
> Hash Cond: (genotype.allele1id = a1.alleleid)
> -> Hash Join (cost=82632.33..34731274.54 rows=601095277 width=34)
> Hash Cond: (genotype.allele2id = a2.alleleid)
> -> Seq Scan on genotype (cost=0.00..13976429.77 rows=601095277
> width=36) -> Hash (cost=42474.59..42474.59 rows=2447659 width=6) -> Seq
> Scan on allele a2 (cost=0.00..42474.59 rows=2447659 width=6) -> Hash
> (cost=42474.59..42474.59 rows=2447659 width=6)
> -> Seq Scan on allele a1 (cost=0.00..42474.59 rows=2447659
> width=6) (9 rows)
The above tells you that you don't have indices in place. Postgres chooses a
seq scan - which as the name implies scans all the rows in sequencial order.
I'd add an index on genotype.allele1id and genotype.allele2id
aka
create index gtallele1idx on genotype (allele1id);
create index gtallele2idx on genotype (allele2id);
and also on allele.alleleid:
create index alleleididx on allele (alleleid);
After a "analyze genotype" and "analyze allele" the query should perform much
better. The explain analyze should show you an index scan instead of the seq
scan after that.
HTH
Uwe