Jens Schipkowski wrote:
And running vacuum every 30 seconds does what?
Not yet fully tested. It seems to lower the slow down.
But minimizing the gain of slow down doesn't solve the problem. The
Problem is the decrease of execution speed of DELETE and SELECT
statements by a table row count between 150k - 200k. The table starts
growing first after DELETE statements fails during timeout.
It does not sound professional, but the database seems to be aging by
the increase of executed statements.
It sounds very likely if you aren't vacuuming enough or the tables are
growing rapidly.
vacuuming once a minute is not enough? We reach the execution of 3k
statements per minute (startup time of testing). 1/4 of them are INSERTs
and DELETEs. After 5 minutes a DELETE will took about 50 seconds -
compared to startup time about 300 - 600ms.
You want to vacuum enough so that the deletes don't leave
ever-increasing gaps in your table. If you run VACUUM VERBOSE it will
tell you what it did - if the number of (non-)removable rows keeps
increasing you're not vacuuming more. The trick is to vacuum often but
not have to do a lot of work in each. The autovacuum tool in recent
versions tries to estimate this for you, but might not cope in your case.
The Statements
---------------
// normal insert - does not seem to be the problem - runtime is ok
INSERT INTO
tbl_reg(idreg,idtype,attr1,...,attr6,loc1,...,loc3,register)
VALUES(nextval('tbl_reg_idreg_seq'),1,[attr],[loc],1);
// select finds out which one has not an twin
// a twin is defined as record with the same attr* values
// decreases speed over time until timeout by postgresql
SELECT *
FROM tbl_reg reg
WHERE register <> loc1 AND
idreg NOT IN
(
SELECT reg.idreg
FROM tbl_reg reg, tbl_reg regtwin
WHERE regtwin.register = 1 AND
regtwin.type <> 20 AND
reg.attr1 = regtwin.attr1 AND
reg.attr2 = regtwin.attr2 AND
reg.attr3 = regtwin.attr3 AND
reg.attr4 = regtwin.attr4 AND
reg.attr5 = regtwin.attr5 AND
reg.attr6 = regtwin.attr6 AND
reg.idreg <> regtwin.idreg AND
reg.register = 2
);
I tried to optimize the seslect statement but the group by having
count(*)
1 solution is half as fast as this statement - relating to the query
plan of EXPLAIN ANALYZE.
And what did EXPLAIN ANALYSE show here? I'm guessing you're getting
time increasing as the square of the number of rows in tbl_reg. So, if
50 rows takes 2500s then 100 rows will take 10000s. Now, if you had
enough RAM (or an index) I'd expect the planner to process the table
in a sorted manner so the query-time would increase linearly.
EXPLAIN ANALYZE shows at startup
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on tbl_reg (cost=25841.35..31433.17 rows=72424 width=6)
(actual time=673.877..673.877 rows=0 loops=1)
Filter: ((hashed subplan) OR ((loc1 = 2::smallint) AND (loc2 =
2::smallint)))
SubPlan
-> Merge Join (cost=22186.21..25646.57 rows=77913 width=8)
(actual time=285.624..285.624 rows=0 loops=1)
Merge Cond: (("outer".attr1 = "inner".attr1) AND
("outer"."?column8?" = "inner"."?column8?") AND ("outer"."?column9?" =
"inner"."?column9?") AND ("outer"."?column10?" = "inner"."?column10?")
AND ("outer"."?column11?" = "inner"."?column11?") AND ("outer".attr6 =
"inner".attr6))
Join Filter: ("outer".idreg <> "inner".idreg)
-> Sort (cost=4967.06..4971.65 rows=1835 width=56) (actual
time=285.622..285.622 rows=0 loops=1)
Sort Key: reg.attr1, (reg.attr2)::text,
(reg.attr3)::text, (reg.attr4)::text, (reg.attr5)::text, reg.attr6
-> Seq Scan on tbl_reg reg (cost=0.00..4867.59
rows=1835 width=56) (actual time=285.551..285.551 rows=0 loops=1)
Filter: (register = 2)
-> Sort (cost=17219.15..17569.77 rows=140247 width=56)
(never executed)
Sort Key: regtwin.attr1, (regtwin.attr2)::text,
(regtwin.attr3)::text, (regtwin.attr4)::text, (regtwin.attr5)::text,
regtwin.attr6
-> Seq Scan on tbl_reg regtwin (cost=0.00..5229.70
rows=140247 width=56) (never executed)
Filter: ((register = 1) AND ("type" <> 20))
Total runtime: 604.410 ms
(15 rows)
EXPLAIN ANALYZE shows after 10 minutes load and 1x vacuuming
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on tbl_reg (cost=25841.35..31433.17 rows=72424 width=6)
(actual time=43261.910..43261.910 rows=0 loops=1)
Filter: ((hashed subplan) OR ((loc1 = 2::smallint) AND (loc2 =
2::smallint)))
SubPlan
-> Merge Join (cost=22186.21..25646.57 rows=77913 width=8)
(actual time=43132.063..43132.063 rows=0 loops=1)
Merge Cond: (("outer".attr1 = "inner".attr1) AND
("outer"."?column8?" = "inner"."?column8?") AND ("outer"."?column9?" =
"inner"."?column9?") AND ("outer"."?column10?" = "inner"."?column10?")
AND ("outer"."?column11?" = "inner"."?column11?") AND ("outer".attr6 =
"inner".attr6))
Join Filter: ("outer".idreg <> "inner".idreg)
-> Sort (cost=4967.06..4971.65 rows=1835 width=56) (actual
time=387.071..387.872 rows=1552 loops=1)
Sort Key: reg.attr1, (reg.attr2)::text,
(reg.attr3)::text, (reg.attr4)::text, (reg.attr5)::text, reg.attr6
-> Seq Scan on tbl_reg reg (cost=0.00..4867.59
rows=1835 width=56) (actual time=303.966..325.614 rows=1552 loops=1)
Filter: (register = 2)
-> Sort (cost=17219.15..17569.77 rows=140247 width=56)
(actual time=42368.934..42530.986 rows=145324 loops=1)
Sort Key: regtwin.attr1, (regtwin.attr2)::text,
(regtwin.attr3)::text, (regtwin.attr4)::text, (regtwin.attr5)::text,
regtwin.attr6
-> Seq Scan on tbl_reg regtwin (cost=0.00..5229.70
rows=140247 width=56) (actual time=0.015..1159.515 rows=145453 loops=1)
Filter: ((register = 1) AND ("type" <> 20))
Total runtime: 44073.127 ms
(15 rows)
OK - these plans look about the same, but the time is greatly different.
Both have rows=140247 as the estimated number of rows in tbl_reg. Either
you have many more rows in the second case (in which case you're not
running ANALYSE enough) or you have lots of gaps in the table (you're
not running VACUUM enough).
I'd then try putting an index on (attr1,attr2,attr3...attr6) and see if
that helps reduce time.
--
Richard Huxton
Archonet Ltd