Hi,
The problem : Postgres is becoming slow, day after
day, and only a full vacuum fixes the problem.
Information you may need to evaluate :
The problem lies on all tables and queries, as far as I
can tell, but we can focus on a single table for better
comprehension.
The queries I am running to test the speed are :
INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428',
'1111', 1, '2012-06-16 13:39:19', '111');
DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" =
'1111' AND "CONNECTIONID" = 1;
SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID"
= '1111' AND "CONNECTIONID" = 1;
After a full vacuum, they run in about 100ms.
Today, before the full vacuum, they were taking around
500ms.
Below is an explain analyze of the commands AFTER a
full vacuum. I did not run it before, so I can not post
relevant info before the vacuum. So, after the full vacuum
:
explain analyze INSERT INTO "AWAITINGSTATUSSMPP"
VALUES('143428', '1111', 1, '2012-06-16 13:39:19',
'111');
"Result
(cost=0.00..0.01 rows=1 width=0) (actual
time=0.001..0.002 rows=1 loops=1)"
"Trigger for
constraint FK_AWAITINGSTATUSSMPP_MESSAGES: time=0.131
calls=1"
"Trigger
bucardo_add_delta: time=0.454 calls=1"
"Trigger
bucardo_triggerkick_MassSMs: time=0.032 calls=1"
"Total runtime:
0.818 ms"
explain analyze DELETE FROM
"AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND
"CONNECTIONID" = 1;"Seq Scan on
"AWAITINGSTATUSSMPP" (cost=0.00..2.29 rows=1 width=6)
(actual time=0.035..0.035 rows=0 loops=1)"
" Filter:
((("SMSCMSGID")::text = '1111'::text) AND
("CONNECTIONID" = 1))"
"Trigger
bucardo_triggerkick_MassSMs: time=0.066 calls=1"
"Total runtime:
0.146 ms"
explain analyze SELECT * FROM
"AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND
"CONNECTIONID" = 1;
"Seq Scan on
"AWAITINGSTATUSSMPP" (cost=0.00..2.29 rows=1 width=557)
(actual time=0.028..0.028 rows=0 loops=1)"
" Filter:
((("SMSCMSGID")::text = '1111'::text) AND
("CONNECTIONID" = 1))"
"Total runtime:
0.053 ms"
Below are the metadata of the table :
=====================================
CREATE TABLE "AWAITINGSTATUSSMPP"
(
"MESSAGEID" bigint NOT NULL,
"SMSCMSGID" character varying(50) NOT NULL,
"CONNECTIONID" smallint NOT NULL,
"EXPIRE_TIME" timestamp without time zone NOT NULL,
"RECIPIENT" character varying(20) NOT NULL,
"CLIENT_MSG_ID" character varying(255),
CONSTRAINT "PK_AWAITINGSTATUSSMPP" PRIMARY KEY
("SMSCMSGID", "CONNECTIONID"),
CONSTRAINT "FK_AWAITINGSTATUSSMPP_MESSAGES" FOREIGN
KEY ("MESSAGEID")
REFERENCES "MESSAGES" ("ID") MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (
OIDS=FALSE
);
ALTER TABLE "AWAITINGSTATUSSMPP" OWNER TO postgres;
GRANT ALL ON TABLE "AWAITINGSTATUSSMPP" TO
"MassSMsUsers";
CREATE INDEX "IX_AWAITINGSTATUSSMPP_MSGID_RCP"
ON "AWAITINGSTATUSSMPP"
USING btree
("MESSAGEID", "RECIPIENT");
CREATE TRIGGER bucardo_add_delta
AFTER INSERT OR UPDATE OR DELETE
ON "AWAITINGSTATUSSMPP"
FOR EACH ROW
EXECUTE PROCEDURE
bucardo."bucardo_add_delta_SMSCMSGID|CONNECTIONID"();
CREATE TRIGGER "bucardo_triggerkick_MassSMs"
AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE
ON "AWAITINGSTATUSSMPP"
FOR EACH STATEMENT
EXECUTE PROCEDURE
bucardo."bucardo_triggerkick_MassSMs"();
=====================================
The table only has about 200 records because it is
being used a temporary storage and records are constantly
inserted and deleted.
BUT please don't get hold on this fact, because as I
already said, the speed problem is not restricted to this
table. The same problems appear on the following query
UPDATE "MESSAGES" SET "SENT" = "SENT" + 1 WHERE "ID" =
143447;
and MESSAGES table has mainly inserts and few
deletes...
My postgresql.conf file :
======================
port = 5433 # (change
requires restart)
max_connections = 100 # (change
requires restart)
shared_buffers = 256MB # min
128kB. DoubleIP - Default was 32MB
synchronous_commit = off # immediate
fsync at commit. DoubleIP - Default was on
effective_cache_size = 512MB # DoubleIP
- Default was 128MB
log_destination = 'stderr' # Valid
values are combinations of
logging_collector = on # Enable
capturing of stderr and csvlog
silent_mode = on # Run
server silently.
log_line_prefix = '%t %d %u ' # special
values:
log_autovacuum_min_duration = 0 # -1
disables, 0 logs all actions and
autovacuum_naptime = 28800 # time
between autovacuum runs. DoubleIP - default was 1min
autovacuum_vacuum_threshold = 100 # min
number of row updates before
autovacuum_vacuum_scale_factor = 0.0 # fraction
of table size before vacuum. DoubleIP - default was
0.2
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8' #
locale for system error message
lc_monetary = 'en_US.UTF-8' #
locale for monetary formatting
lc_numeric = 'en_US.UTF-8' #
locale for number formatting
lc_time = 'en_US.UTF-8' #
locale for time formatting
default_text_search_config = 'pg_catalog.english'
=======================
As you will see, I have altered the shared_buffers
and synchronous_commit values.
The shared_buffers had the default value 32Mb. When I
changed it to 256Mb the problem still appears but it
takes more time to appear (3-4 days). With 32MB, it
appeared faster, probably after 24 hours.
Also, I have changed the autovacuum daemon to work
every 8 hours but I changed its values to make sure it
vacuums pretty much all tables (the ones for which at
least 100 rows have changed).
Please note, though, that my problem existed even
before playing around with the autovacuum. This is why I
tried to change its values in the first place.
The server is synchronized with another server using
bucardo. Bucardo process is running on the other server.
The same problem appears on the 2nd server too...
after 3-4 days, postgres is running slower and slower.
Our server configuration :
DELL PowerEdge T610 Tower Chassis for Up to 8x
3.5" HDDs
2x Intel Xeon E5520 Processor (2.26GHz, 8M
Cache, 5.86 GT/s QPI, Turbo, HT), 1066MHz Max Memory
8GB Memory,1333MHz
2 x 146GB SAS 15k 3.5" HD Hot Plug
6 x 1TB SATA 7.2k 3.5" Additional HD Hot Plug
PERC 6/i RAID Controller Card 256MB PCIe, 2x4
Connectors
SUSE Linux Enterprise Server 10, SP2
The 2 HDs are set up with RAID-1
The 6 HDs are set up with RAID-5
Linux is running on the RAID-1 configuration
Postgres is running on the RAID-5 configuration
Finally a top before and after the full vacuum :
top - 11:27:44 up
72 days, 13:27, 37 users, load average: 1.05, 1.31,
1.45
Tasks: 279 total,
3 running, 276 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.6%us,
0.8%sy, 0.0%ni, 95.5%id, 0.0%wa, 0.0%hi, 0.1%si,
0.0%st
Mem: 8166432k
total, 7963116k used, 203316k free, 115344k buffers
Swap: 2097144k
total, 2097020k used, 124k free, 2337636k cached
top - 11:30:58 up
72 days, 13:31, 38 users, load average: 1.53, 1.59,
1.53
Tasks: 267 total,
2 running, 265 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.3%us,
0.4%sy, 0.0%ni, 98.0%id, 0.3%wa, 0.0%hi, 0.1%si,
0.0%st
Mem: 8166432k
total, 6016268k used, 2150164k free, 61092k buffers
Swap: 2097144k
total, 2010204k used, 86940k free, 2262896k cached
I hope I have provided enough info and hope that
someone can point me to the correct direction.
Thank you very much even for reading up to here !
Best regards,
Kiriakos