Fwd: very, very slow performance

Emanuel Calvo Franco <postgres.arg@xxxxxxxxx> · Fri, 20 Feb 2009 18:11:44 -0200

2009/2/20 Tena Sakai <tsakai@xxxxxxxxxxxxxx>:
> Hi Emanuel,
>
>> Do you tried partitioned tables? diferent tablespaces?
>> set the storage external for more important columns?
>> what kind of indexes do you have?
>
> I think the table design is ok.

Sorry, copy to the list.

The table desing is ok. Partitioned tables is about cut
tables in terms of number of regs. It improve your engine
, specially if you are talking about that number of
regs.

> I don't know what "set the storage external" means.

I't means the colmun will storage separated, sometimes
improve performance.

> Maybe you can explain it to me in lay-person's terms?
> I think indices are ok, but I welcome your comments.
>

Indexes maybe ok. What I mean is th TYPE of that. Maybe
HASH or GIST could be better, depends on the column format.

>   canon=# \d genotype
>                                               Table "public.genotype"
>        Column      |            Type             |
> Modifiers
>
> -----------------+-----------------------------+---------------------------------------------------------------
>    genotypeid      | integer                     | not null default
> nextval('genotype_genotypeid_seq'::regclass)
>    subjectid       | integer                     |
>    markerid        | integer                     |
>    allele1id       | integer                     |
>    allele2id       | integer                     |
>    datecreated     | timestamp without time zone | not null
>    datereplaced    | timestamp without time zone | not null
>    ignore          | character(1)                | not null default
> 'N'::bpchar
>    inconsistent    | character(1)                | not null default
> 'N'::bpchar
>    sourcetablename | character varying           | not null
>    sourceid        | character varying           | not null
>   Indexes:
>       "genotype_pkey" PRIMARY KEY, btree (genotypeid, datecreated)
>       "genotype_genotypeid_idx" btree (genotypeid)
>       "genotype_markerid_idx" btree (markerid)
>       "genotype_source_idx" btree (sourceid, sourcetablename)
>       "genotype_subjectid_idx" btree (subjectid)
>
>   canon=# \d allele
>                                            Table "public.allele"
>       Column    |            Type             |
> Modifiers
>
> --------------+-----------------------------+-----------------------------------------------------------
>    alleleid     | integer                     | not null default
> nextval('allele_alleleid_seq'::regclass)
>    markerid     | integer                     |
>   value        | character varying           | not null
>    datecreated  | timestamp without time zone | not null
>    datereplaced | timestamp without time zone | not null
>   Indexes:
>       "allele_pkey" PRIMARY KEY, btree (alleleid, datecreated)
>       "allele_markerid_idx" btree (markerid)
>
>> try to run explain analyze for those querys, then
>> post the results.
>
> Yes, it is running for last 13+ hours and I have no
> idea how much longer it might take.
>

>> I recommend follow this thread:
>> http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php
>
> That's pretty interesting stuff.  I need to experiment tweaking
> parameters Joshua mentions.  Alas, now is not a good time to do
> so.
>

I'm glad if it's useful.

> Regards,
>
> Tena Sakai
>
> -----Original Message-----
> From: pgsql-admin-owner@xxxxxxxxxxxxxx on behalf of Emanuel Calvo Franco
> Sent: Fri 2/20/2009 4:10 AM
> To: pgsql-admin@xxxxxxxxxxxxxx
> Subject: Re:  very, very slow performance
>
> 2009/2/20 Tena Sakai <tsakai@xxxxxxxxxxxxxx>:
>> Hi Everybody,
>>
>> Here's the query (please read them in fixed-size font, if
>> you can):
>>
>>  select subjectid, genotype.markerid, a1.value as allele1,
>>         a2.value as allele2, genotype.dateCreated,
>>         genotype.dateReplaced, genotype.ignore,
>>         genotype.inconsistent
>>    from genotype, allele a1, allele a2
>>   where
>>         allele1id = a1.alleleid
>>     and
>>         allele2id = a2.alleleid;
>>
>> Genotype table mentioned above has about 600,000,000+ rows.  As
>> I mentioned, there are 20 more of them running concurrently.
>> 3 other jobs look like:
>>
>
> Do you tried partitioned tables? diferent tablespaces? set the storage
> external for
> more important columns? what kind of indexes do you have?
>
>>  SELECT a.markerid,a.type,a.localname,b.ncbibuild,
>>         a.chromosome,a.geneticposition,b.physicalposition,
>>         a.strand,a.stdflanks,a.maxflanks,a.datecreated,
>>         a.datereplaced,a.sourcetablename,a.sourceid,
>>         b.dbsnprsid,a.ignore,a.gene
>>    FROM public.marker a, public.snpposition b
>>   WHERE
>>         a.ignore= 'N'
>>     AND a.datecreated <= (timestamp'Wed Oct 29 09:35:54.266 2008')
>>     AND a.datereplaced > (timestamp'Wed Oct 29 09:35:54.266 2008')
>>     AND a.localname IN
>> ('RS10757474','RS7859598','RS6148','RS9792663','RS1541125',
>>
>> 'RS10511446','RS10814410','RS12338622','RS875587',
>>                          'RS1590979', 'RS748786','RS958505','RS12352961',
>>                           and on and on and on...);
>>
>>
>>   insert into summarystats
>>                     select 'Marker by Chromosomes', chromosome,
>> sourcetablename,
>>                             count(*), null, to_timestamp('2009-02-18
>> 20:29:40.125',
>>                             'yyyy-mm-dd hh:mi:ss.ms')
>>                       from marker
>>                      where ignore = 'N'
>>                        and datereplaced = '3000-01-01 12:00:00.000'
>>                        and exists (select 1
>>                                      from genotype
>>                                     where genotype.markerid =
>> marker.markerid
>>                                       and genotype.ignore = 'N'
>>                                       and genotype.datereplaced =
>> '3000-01-01 12:00:00.000')
>>                      group by chromosome, sourcetablename;
>>
>>   COPY public.genotype (genotypeid, subjectid, markerid,
>>                         allele1id, allele2id, datecreated, datereplaced,
>>                         ignore, inconsistent, sourcetablename, sourceid)
>>     TO stdout;
>>
>
> Like Scott saids, try to run explain analyze for those querys, then
> post the results.
>
>
>>
>> Can anybody suggest anything that I can do to gain speed?
>> Any help is much appreciated.
>>
>
> I recommend follow this thread:
> http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php
>
>
>> Regards,
>>
>> Tena Sakai
>> tsakai@xxxxxxxxxxxxxx
>>
>>
>>
>
>
>
> --
>       Emanuel Calvo Franco
>         Sumate al ARPUG !
>       (www.postgres-arg.org -
>          www.arpug.com.ar)
>     ArPUG / AOSUG Member
>    Postgresql Support & Admin
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>
>

--
     Emanuel Calvo Franco
       Sumate al ARPUG !
     (www.postgres-arg.org -
        www.arpug.com.ar)
   ArPUG / AOSUG Member
  Postgresql Support & Admin

-- 
      Emanuel Calvo Franco
        Sumate al ARPUG !
      (www.postgres-arg.org -
         www.arpug.com.ar)
    ArPUG / AOSUG Member
   Postgresql Support & Admin

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin