Trouble with plan statistics for behaviour for query.

Trevor Campbell <tcampbell@xxxxxxxxxxxxx> · Fri, 01 Jun 2012 08:29:47 +1000



    We are having trouble with a particular
      query being slow in a strange manner.

      
      The query is a join over two large tables that are suitably
      indexed.

      
      select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID,
      CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE,
      CI.NEWSTRING 

         from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on
      CG.ID = CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc,
      CI.ID asc

      
      For some tasks we run this particular query a very large number of
      times and it has a significant performance impact when it runs
      slowly.

      
      If we run ANALYSE over the CHANGEITEM
    table then the performance picks up by a
      factor of 5 or more.  The problem is that a day later the
      performance will have dropped back to its previously slow state.

      
      The reason this is so hard to understand is that the activity on
      this table is very low, with no updates and only a relatively
      small number of inserts each day, < 0.1% of the table size.

      
      Explain output:

      Sort  (cost=86.90..86.93 rows=11 width=118) (actual
      time=0.086..0.087 rows=14 loops=1)

        Sort Key: cg.created, ci.id

        Sort Method: quicksort  Memory: 26kB

        ->  Nested Loop  (cost=0.00..86.71 rows=11 width=118) (actual
      time=0.022..0.061 rows=14 loops=1)

              ->  Index Scan using chggroup_issue on changegroup cg 
      (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015
      rows=7 loops=1)

                    Index Cond: (issueid = 81001::numeric)

              ->  Index Scan using chgitem_chggrp on changeitem ci 
      (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2
      loops=7)

                    Index Cond: (groupid = cg.id)

      Total runtime: 0.116 ms

      
      The explain output always seems the same even when the performance
      is poor, but I can't be sure of that.

      
      Overall it seems like PostgreSQL just forgets about the statistics
      it has gathered after a short while.

      
      Schema details:

      CREATE TABLE changegroup

      (

        id numeric(18,0) NOT NULL,

        issueid numeric(18,0),

        author character varying(255),

        created timestamp with time zone,

        CONSTRAINT pk_changegroup PRIMARY KEY (id )

      )

      WITH (

        OIDS=FALSE

      );

      CREATE INDEX chggroup_issue

        ON changegroup

        USING btree

        (issueid );

      
      CREATE TABLE changeitem

      (

        id numeric(18,0) NOT NULL,

        groupid numeric(18,0),

        fieldtype character varying(255),

        field character varying(255),

        oldvalue text,

        oldstring text,

        newvalue text,

        newstring text,

        CONSTRAINT pk_changeitem PRIMARY KEY (id )

      )

      WITH (

        OIDS=FALSE

      );

      
      CREATE INDEX chgitem_chggrp

        ON changeitem

        USING btree

        (groupid );

      
      CREATE INDEX chgitem_field

        ON changeitem

        USING btree

        (field COLLATE pg_catalog."default" );

      
      Table sizes

    changegroup  :  2,000,000
    rows

    changeitem    :  2,500,000  rows

    
    The changegroup table has on average about 4 rows per issueid value,
    which is the query parameter.

    
    We run autovacuum and autoanalyse, but as the activity in the table
    is low these are rarely if ever invoked on these tables.

    
    Environment.

    Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu,  although
    this is a problem across a variety of postgres versions.