Re: very, very slow performance

"Tena Sakai" <tsakai@xxxxxxxxxxxxxx> · Fri, 20 Feb 2009 11:59:41 -0800

Title: RE:  very, very slow performance

Hi Emanuel,

> Do you tried partitioned tables? diferent tablespaces?

> set the storage external for more important columns?

> what kind of indexes do you have?

I think the table design is ok.

I don't know what "set the storage external" means.

Maybe you can explain it to me in lay-person's terms?

I think indices are ok, but I welcome your comments.

  canon=# \d genotype

                                              Table "public.genotype"

       Column      |            Type             |                           Modifiers                          

  -----------------+-----------------------------+---------------------------------------------------------------

   genotypeid      | integer                     | not null default nextval('genotype_genotypeid_seq'::regclass)

   subjectid       | integer                     |

   markerid        | integer                     |

   allele1id       | integer                     |

   allele2id       | integer                     |

   datecreated     | timestamp without time zone | not null

   datereplaced    | timestamp without time zone | not null

   ignore          | character(1)                | not null default 'N'::bpchar

   inconsistent    | character(1)                | not null default 'N'::bpchar

   sourcetablename | character varying           | not null

   sourceid        | character varying           | not null

  Indexes:

      "genotype_pkey" PRIMARY KEY, btree (genotypeid, datecreated)

      "genotype_genotypeid_idx" btree (genotypeid)

      "genotype_markerid_idx" btree (markerid)

      "genotype_source_idx" btree (sourceid, sourcetablename)

      "genotype_subjectid_idx" btree (subjectid)

  canon=# \d allele

                                           Table "public.allele"

      Column    |            Type             |                         Modifiers                        

  --------------+-----------------------------+-----------------------------------------------------------

   alleleid     | integer                     | not null default nextval('allele_alleleid_seq'::regclass)

   markerid     | integer                     |

  value        | character varying           | not null

   datecreated  | timestamp without time zone | not null

   datereplaced | timestamp without time zone | not null

  Indexes:

      "allele_pkey" PRIMARY KEY, btree (alleleid, datecreated)

      "allele_markerid_idx" btree (markerid)

> try to run explain analyze for those querys, then

> post the results.

Yes, it is running for last 13+ hours and I have no

idea how much longer it might take.

> I recommend follow this thread:

> http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php

That's pretty interesting stuff.  I need to experiment tweaking

parameters Joshua mentions.  Alas, now is not a good time to do

so.

Regards,

Tena Sakai

-----Original Message-----

From: pgsql-admin-owner@xxxxxxxxxxxxxx on behalf of Emanuel Calvo Franco

Sent: Fri 2/20/2009 4:10 AM

To: pgsql-admin@xxxxxxxxxxxxxx

Subject: Re:  very, very slow performance

2009/2/20 Tena Sakai <tsakai@xxxxxxxxxxxxxx>:

> Hi Everybody,

>

> Here's the query (please read them in fixed-size font, if

> you can):

>

>  select subjectid, genotype.markerid, a1.value as allele1,

>         a2.value as allele2, genotype.dateCreated,

>         genotype.dateReplaced, genotype.ignore,

>         genotype.inconsistent

>    from genotype, allele a1, allele a2

>   where

>         allele1id = a1.alleleid

>     and

>         allele2id = a2.alleleid;

>

> Genotype table mentioned above has about 600,000,000+ rows.  As

> I mentioned, there are 20 more of them running concurrently.

> 3 other jobs look like:

>

Do you tried partitioned tables? diferent tablespaces? set the storage

external for

more important columns? what kind of indexes do you have?

>  SELECT a.markerid,a.type,a.localname,b.ncbibuild,

>         a.chromosome,a.geneticposition,b.physicalposition,

>         a.strand,a.stdflanks,a.maxflanks,a.datecreated,

>         a.datereplaced,a.sourcetablename,a.sourceid,

>         b.dbsnprsid,a.ignore,a.gene

>    FROM public.marker a, public.snpposition b

>   WHERE

>         a.ignore= 'N'

>     AND a.datecreated <= (timestamp'Wed Oct 29 09:35:54.266 2008')

>     AND a.datereplaced > (timestamp'Wed Oct 29 09:35:54.266 2008')

>     AND a.localname IN

> ('RS10757474','RS7859598','RS6148','RS9792663','RS1541125',

>                          'RS10511446','RS10814410','RS12338622','RS875587',

>                          'RS1590979', 'RS748786','RS958505','RS12352961',

>                           and on and on and on...);

>

>

>   insert into summarystats

>                     select 'Marker by Chromosomes', chromosome,

> sourcetablename,

>                             count(*), null, to_timestamp('2009-02-18

> 20:29:40.125',

>                             'yyyy-mm-dd hh:mi:ss.ms')

>                       from marker

>                      where ignore = 'N'

>                        and datereplaced = '3000-01-01 12:00:00.000'

>                        and exists (select 1

>                                      from genotype

>                                     where genotype.markerid =

> marker.markerid

>                                       and genotype.ignore = 'N'

>                                       and genotype.datereplaced =

> '3000-01-01 12:00:00.000')

>                      group by chromosome, sourcetablename;

>

>   COPY public.genotype (genotypeid, subjectid, markerid,

>                         allele1id, allele2id, datecreated, datereplaced,

>                         ignore, inconsistent, sourcetablename, sourceid)

>     TO stdout;

>

Like Scott saids, try to run explain analyze for those querys, then

post the results.

>

> Can anybody suggest anything that I can do to gain speed?

> Any help is much appreciated.

>

I recommend follow this thread:

http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php

> Regards,

>

> Tena Sakai

> tsakai@xxxxxxxxxxxxxx

>

>

>

--

      Emanuel Calvo Franco

        Sumate al ARPUG !

      (www.postgres-arg.org -

         www.arpug.com.ar)

    ArPUG / AOSUG Member

   Postgresql Support & Admin

--

Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-admin