Re: Postgres becoming slow, only full vacuum fixes it

Julien Cigar <jcigar@xxxxxxxxx> · Mon, 24 Sep 2012 14:21:09 +0200



    Hello,

    
    1) upgrade your PostgreSQL installation, there have been numerous
    bugfixes releases since 8.4.2

    2) you'll have to show us an explain analyze of the slow queries. If
    I take a look at those you provided everything run i less than 1ms.

    3) with 200 records you'll always have a seqscan

    4) how much memory do you have ? shared_buffers = 256MB and
    effective_cache_size = 512MB looks OK only if you have between 1 and
    2GB of RAM

    5) synchronous_commit = off should only be used if you have a
    battery-backed write cache.

    6) autovacuum_naptime should be changed only if autovacuum is
    constantly running (so if you have dozen of databases in your
    cluster)

    7) are you sure the problem isn't related to Bucardo ?

    
    Julien

    
    On 09/24/2012 13:55, Kiriakos Tsourapas wrote:
    
      Sorry, forgot to mention the most obvious and important
        information :
      My postgres is 8.4.2
      

        On Sep 24, 2012, at 13:33, Kiriakos Tsourapas wrote:
        

          Hi,
            

            The problem : Postgres is becoming slow, day after
                day, and only a full vacuum fixes the problem.
            

            Information you may need to evaluate :
            

            The problem lies on all tables and queries, as far as I
              can tell, but we can focus on a single table for better
              comprehension.
            

            The queries I am running to test the speed are :
            
              INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428',
                '1111', 1, '2012-06-16 13:39:19', '111');
              DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" =
                '1111' AND "CONNECTIONID" = 1;
              SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID"
                = '1111' AND "CONNECTIONID" = 1;
            
            
            After a full vacuum, they run in about 100ms.
            Today, before the full vacuum, they were taking around
              500ms.
            

            Below is an explain analyze of the commands AFTER a
              full vacuum. I did not run it before, so I can not post
              relevant info before the vacuum. So, after the full vacuum
              :
            

              explain analyze INSERT INTO "AWAITINGSTATUSSMPP"
                VALUES('143428', '1111', 1, '2012-06-16 13:39:19',
                '111');
              "Result 
                (cost=0.00..0.01 rows=1 width=0) (actual
                time=0.001..0.002 rows=1 loops=1)"
              "Trigger for
                constraint FK_AWAITINGSTATUSSMPP_MESSAGES: time=0.131
                calls=1"
              "Trigger
                bucardo_add_delta: time=0.454 calls=1"
              "Trigger
                bucardo_triggerkick_MassSMs: time=0.032 calls=1"
              "Total runtime:
                0.818 ms"
              

              explain analyze DELETE FROM
                  "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND
                  "CONNECTIONID" = 1;"Seq Scan on
                "AWAITINGSTATUSSMPP"  (cost=0.00..2.29 rows=1 width=6)
                (actual time=0.035..0.035 rows=0 loops=1)"
              "  Filter:
                ((("SMSCMSGID")::text = '1111'::text) AND
                ("CONNECTIONID" = 1))"
              "Trigger
                bucardo_triggerkick_MassSMs: time=0.066 calls=1"
              "Total runtime:
                0.146 ms"
              

              explain analyze SELECT * FROM
                  "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND
                  "CONNECTIONID" = 1;
              "Seq Scan on
                "AWAITINGSTATUSSMPP"  (cost=0.00..2.29 rows=1 width=557)
                (actual time=0.028..0.028 rows=0 loops=1)"
              "  Filter:
                ((("SMSCMSGID")::text = '1111'::text) AND
                ("CONNECTIONID" = 1))"
              "Total runtime:
                0.053 ms"
            
            
            Below are the metadata of the table :
            =====================================
            
              CREATE TABLE "AWAITINGSTATUSSMPP"
              (
                "MESSAGEID" bigint NOT NULL,
                "SMSCMSGID" character varying(50) NOT NULL,
                "CONNECTIONID" smallint NOT NULL,
                "EXPIRE_TIME" timestamp without time zone NOT NULL,
                "RECIPIENT" character varying(20) NOT NULL,
                "CLIENT_MSG_ID" character varying(255),
                CONSTRAINT "PK_AWAITINGSTATUSSMPP" PRIMARY KEY
                ("SMSCMSGID", "CONNECTIONID"),
                CONSTRAINT "FK_AWAITINGSTATUSSMPP_MESSAGES" FOREIGN
                KEY ("MESSAGEID")
                    REFERENCES "MESSAGES" ("ID") MATCH SIMPLE
                    ON UPDATE NO ACTION ON DELETE CASCADE
              )
              WITH (
                OIDS=FALSE
              );
              ALTER TABLE "AWAITINGSTATUSSMPP" OWNER TO postgres;
              GRANT ALL ON TABLE "AWAITINGSTATUSSMPP" TO
                "MassSMsUsers";
              

              CREATE INDEX "IX_AWAITINGSTATUSSMPP_MSGID_RCP"
                ON "AWAITINGSTATUSSMPP"
                USING btree
                ("MESSAGEID", "RECIPIENT");
              

              CREATE TRIGGER bucardo_add_delta
                AFTER INSERT OR UPDATE OR DELETE
                ON "AWAITINGSTATUSSMPP"
                FOR EACH ROW
                EXECUTE PROCEDURE
                bucardo."bucardo_add_delta_SMSCMSGID|CONNECTIONID"();
              

              CREATE TRIGGER "bucardo_triggerkick_MassSMs"
                AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE
                ON "AWAITINGSTATUSSMPP"
                FOR EACH STATEMENT
                EXECUTE PROCEDURE
                bucardo."bucardo_triggerkick_MassSMs"();
              =====================================
            
            
            The table only has about 200 records because it is
              being used a temporary storage and records are constantly
              inserted and deleted.
            BUT please don't get hold on this fact, because as I
              already said, the speed problem is not restricted to this
              table. The same problems appear on the following query 
            UPDATE "MESSAGES"  SET "SENT" = "SENT" + 1 WHERE "ID" =
              143447;
            and MESSAGES table has mainly inserts and few
              deletes...
            

              My postgresql.conf file :
              ======================
              
                port = 5433                             # (change
                  requires restart)
                max_connections = 100                   # (change
                  requires restart)
                shared_buffers = 256MB                  # min
                  128kB. DoubleIP - Default was 32MB
                synchronous_commit = off                # immediate
                  fsync at commit. DoubleIP - Default was on
                effective_cache_size = 512MB            # DoubleIP
                  - Default was 128MB
                log_destination = 'stderr'              # Valid
                  values are combinations of
                logging_collector = on                  # Enable
                  capturing of stderr and csvlog
                silent_mode = on                        # Run
                  server silently.
                log_line_prefix = '%t %d %u '           # special
                  values:
                log_autovacuum_min_duration = 0         # -1
                  disables, 0 logs all actions and
                autovacuum_naptime = 28800              # time
                  between autovacuum runs. DoubleIP - default was 1min
                autovacuum_vacuum_threshold = 100       # min
                  number of row updates before
                autovacuum_vacuum_scale_factor = 0.0    # fraction
                  of table size before vacuum. DoubleIP - default was
                  0.2
                datestyle = 'iso, mdy'
                lc_messages = 'en_US.UTF-8'                     #
                  locale for system error message
                lc_monetary = 'en_US.UTF-8'                     #
                  locale for monetary formatting
                lc_numeric = 'en_US.UTF-8'                      #
                  locale for number formatting
                lc_time = 'en_US.UTF-8'                         #
                  locale for time formatting
                default_text_search_config = 'pg_catalog.english'
              
              =======================
              

              As you will see, I have altered the shared_buffers
                and synchronous_commit values.
              The shared_buffers had the default value 32Mb. When I
                changed it to 256Mb the problem still appears but it
                takes more time to appear (3-4 days). With 32MB, it
                appeared faster, probably after 24 hours.
              Also, I have changed the autovacuum daemon to work
                every 8 hours but I changed its values to make sure it
                vacuums pretty much all tables (the ones for which at
                least 100 rows have changed).
              Please note, though, that my problem existed even
                before playing around with the autovacuum. This is why I
                tried to change its values in the first place.
              

              The server is synchronized with another server using
                bucardo. Bucardo process is running on the other server.
              The same problem appears on the 2nd server too...
                after 3-4 days, postgres is running slower and slower.
            
            
            Our server configuration :
            
              DELL PowerEdge T610 Tower Chassis for Up to 8x
                3.5" HDDs
              2x Intel Xeon E5520 Processor (2.26GHz, 8M
                Cache, 5.86 GT/s QPI, Turbo, HT), 1066MHz Max Memory
              8GB Memory,1333MHz
              2 x 146GB SAS 15k 3.5" HD Hot Plug
              6 x 1TB SATA 7.2k 3.5" Additional HD Hot Plug
              PERC 6/i RAID Controller Card 256MB PCIe, 2x4
                Connectors
              SUSE Linux Enterprise Server 10, SP2
            
            
            The 2 HDs are set up with RAID-1
            The 6 HDs are set up with RAID-5
            

            Linux is running on the RAID-1 configuration
            Postgres is running on the RAID-5 configuration
            

            Finally a top before and after the full vacuum :
            
              top - 11:27:44 up
                72 days, 13:27, 37 users,  load average: 1.05, 1.31,
                1.45
              Tasks: 279 total,
                  3 running, 276 sleeping,   0 stopped,   0 zombie
              Cpu(s):  3.6%us, 
                0.8%sy,  0.0%ni, 95.5%id,  0.0%wa,  0.0%hi,  0.1%si, 
                0.0%st
              Mem:   8166432k
                total,  7963116k used,   203316k free,   115344k buffers
              Swap:  2097144k
                total,  2097020k used,      124k free,  2337636k cached
            
            
              top - 11:30:58 up
                72 days, 13:31, 38 users,  load average: 1.53, 1.59,
                1.53
              Tasks: 267 total,
                  2 running, 265 sleeping,   0 stopped,   0 zombie
              Cpu(s):  1.3%us, 
                0.4%sy,  0.0%ni, 98.0%id,  0.3%wa,  0.0%hi,  0.1%si, 
                0.0%st
              Mem:   8166432k
                total,  6016268k used,  2150164k free,    61092k buffers
              Swap:  2097144k
                total,  2010204k used,    86940k free,  2262896k cached
            
            
            I hope I have provided enough info and hope that
              someone can point me to the correct direction.
            

            Thank you very much even for reading up to here !
            

            Best regards,
            Kiriakos
          
        
    -- 
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

  
begin:vcard
fn:Julien Cigar
n:Cigar;Julien
org;quoted-printable:Belgian Biodiversity Platform;ULB (Universit=C3=A9 Libre de Bruxelles)
adr:;;;Brussels;;;Belgium
email;internet:jcigar@xxxxxxxxx
tel;work:+32(0)26505752
x-mozilla-html:FALSE
url:http://www.biodiversity.be
version:2.1
end:vcard

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance