Hi, The problem : Postgres is becoming slow, day after day, and only a full vacuum fixes the problem. Information you may need to evaluate : The problem lies on all tables and queries, as far as I can tell, but we can focus on a single table for better comprehension. The queries I am running to test the speed are : INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428', '1111', 1, '2012-06-16 13:39:19', '111'); DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1; SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1; After a full vacuum, they run in about 100ms. Today, before the full vacuum, they were taking around 500ms. Below is an explain analyze of the commands AFTER a full vacuum. I did not run it before, so I can not post relevant info before the vacuum. So, after the full vacuum : explain analyze INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428', '1111', 1, '2012-06-16 13:39:19', '111'); "Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)" "Trigger for constraint FK_AWAITINGSTATUSSMPP_MESSAGES: time=0.131 calls=1" "Trigger bucardo_add_delta: time=0.454 calls=1" "Trigger bucardo_triggerkick_MassSMs: time=0.032 calls=1" "Total runtime: 0.818 ms" explain analyze DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;"Seq Scan on "AWAITINGSTATUSSMPP" (cost=0.00..2.29 rows=1 width=6) (actual time=0.035..0.035 rows=0 loops=1)" " Filter: ((("SMSCMSGID")::text = '1111'::text) AND ("CONNECTIONID" = 1))" "Trigger bucardo_triggerkick_MassSMs: time=0.066 calls=1" "Total runtime: 0.146 ms" explain analyze SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1; "Seq Scan on "AWAITINGSTATUSSMPP" (cost=0.00..2.29 rows=1 width=557) (actual time=0.028..0.028 rows=0 loops=1)" " Filter: ((("SMSCMSGID")::text = '1111'::text) AND ("CONNECTIONID" = 1))" "Total runtime: 0.053 ms" Below are the metadata of the table : ===================================== CREATE TABLE "AWAITINGSTATUSSMPP" ( "MESSAGEID" bigint NOT NULL, "SMSCMSGID" character varying(50) NOT NULL, "CONNECTIONID" smallint NOT NULL, "EXPIRE_TIME" timestamp without time zone NOT NULL, "RECIPIENT" character varying(20) NOT NULL, "CLIENT_MSG_ID" character varying(255), CONSTRAINT "PK_AWAITINGSTATUSSMPP" PRIMARY KEY ("SMSCMSGID", "CONNECTIONID"), CONSTRAINT "FK_AWAITINGSTATUSSMPP_MESSAGES" FOREIGN KEY ("MESSAGEID") REFERENCES "MESSAGES" ("ID") MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE ) WITH ( OIDS=FALSE ); ALTER TABLE "AWAITINGSTATUSSMPP" OWNER TO postgres; GRANT ALL ON TABLE "AWAITINGSTATUSSMPP" TO "MassSMsUsers"; CREATE INDEX "IX_AWAITINGSTATUSSMPP_MSGID_RCP" ON "AWAITINGSTATUSSMPP" USING btree ("MESSAGEID", "RECIPIENT"); CREATE TRIGGER bucardo_add_delta AFTER INSERT OR UPDATE OR DELETE ON "AWAITINGSTATUSSMPP" FOR EACH ROW EXECUTE PROCEDURE bucardo."bucardo_add_delta_SMSCMSGID|CONNECTIONID"(); CREATE TRIGGER "bucardo_triggerkick_MassSMs" AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE ON "AWAITINGSTATUSSMPP" FOR EACH STATEMENT EXECUTE PROCEDURE bucardo."bucardo_triggerkick_MassSMs"(); ===================================== The table only has about 200 records because it is being used a temporary storage and records are constantly inserted and deleted. BUT please don't get hold on this fact, because as I already said, the speed problem is not restricted to this table. The same problems appear on the following query UPDATE "MESSAGES" SET "SENT" = "SENT" + 1 WHERE "ID" = 143447; and MESSAGES table has mainly inserts and few deletes... My postgresql.conf file : ====================== port = 5433 # (change requires restart) max_connections = 100 # (change requires restart) shared_buffers = 256MB # min 128kB. DoubleIP - Default was 32MB synchronous_commit = off # immediate fsync at commit. DoubleIP - Default was on effective_cache_size = 512MB # DoubleIP - Default was 128MB log_destination = 'stderr' # Valid values are combinations of logging_collector = on # Enable capturing of stderr and csvlog silent_mode = on # Run server silently. log_line_prefix = '%t %d %u ' # special values: log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and autovacuum_naptime = 28800 # time between autovacuum runs. DoubleIP - default was 1min autovacuum_vacuum_threshold = 100 # min number of row updates before autovacuum_vacuum_scale_factor = 0.0 # fraction of table size before vacuum. DoubleIP - default was 0.2 datestyle = 'iso, mdy' lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.english' ======================= As you will see, I have altered the shared_buffers and synchronous_commit values. The shared_buffers had the default value 32Mb. When I changed it to 256Mb the problem still appears but it takes more time to appear (3-4 days). With 32MB, it appeared faster, probably after 24 hours. Also, I have changed the autovacuum daemon to work every 8 hours but I changed its values to make sure it vacuums pretty much all tables (the ones for which at least 100 rows have changed). Please note, though, that my problem existed even before playing around with the autovacuum. This is why I tried to change its values in the first place. The server is synchronized with another server using bucardo. Bucardo process is running on the other server. The same problem appears on the 2nd server too... after 3-4 days, postgres is running slower and slower. Our server configuration : DELL PowerEdge T610 Tower Chassis for Up to 8x 3.5" HDDs 2x Intel Xeon E5520 Processor (2.26GHz, 8M Cache, 5.86 GT/s QPI, Turbo, HT), 1066MHz Max Memory 8GB Memory,1333MHz 2 x 146GB SAS 15k 3.5" HD Hot Plug 6 x 1TB SATA 7.2k 3.5" Additional HD Hot Plug PERC 6/i RAID Controller Card 256MB PCIe, 2x4 Connectors SUSE Linux Enterprise Server 10, SP2 The 2 HDs are set up with RAID-1 The 6 HDs are set up with RAID-5 Linux is running on the RAID-1 configuration Postgres is running on the RAID-5 configuration Finally a top before and after the full vacuum : top - 11:27:44 up 72 days, 13:27, 37 users, load average: 1.05, 1.31, 1.45 Tasks: 279 total, 3 running, 276 sleeping, 0 stopped, 0 zombie Cpu(s): 3.6%us, 0.8%sy, 0.0%ni, 95.5%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8166432k total, 7963116k used, 203316k free, 115344k buffers Swap: 2097144k total, 2097020k used, 124k free, 2337636k cached top - 11:30:58 up 72 days, 13:31, 38 users, load average: 1.53, 1.59, 1.53 Tasks: 267 total, 2 running, 265 sleeping, 0 stopped, 0 zombie Cpu(s): 1.3%us, 0.4%sy, 0.0%ni, 98.0%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8166432k total, 6016268k used, 2150164k free, 61092k buffers Swap: 2097144k total, 2010204k used, 86940k free, 2262896k cached I hope I have provided enough info and hope that someone can point me to the correct direction. Thank you very much even for reading up to here ! Best regards, Kiriakos |