hello again,
some more data on the subject:you are right, delete/re-insert didn't solve the problem (haven't yet tried dump/restore, i might tonight, when low traffic).
a short snip from logs were i log all queries that take longer then 1sec:
2014-05-06T15:57:46.303880+03:00 s10 postgres[46220]: [1891-1] prosms prosms LOG: 00000: duration: 1947.172 ms execute <unnamed>: select easysms_jb_pay($1,
$2)
2014-05-06T15:57:46.304230+03:00 s10 postgres[46220]: [1891-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:46.304439+03:00 s10 postgres[46220]: [1891-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:56.199005+03:00 s10 postgres[58002]: [2570-1] prosms prosms LOG: 00000: duration: 6869.886 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:57:56.199349+03:00 s10 postgres[58002]: [2570-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:56.199567+03:00 s10 postgres[58002]: [2570-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:59.134982+03:00 s10 postgres[58002]: [2571-1] prosms prosms LOG: 00000: duration: 1797.747 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:57:59.135334+03:00 s10 postgres[58002]: [2571-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:59.135562+03:00 s10 postgres[58002]: [2571-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:58:07.149477+03:00 s10 postgres[46220]: [1892-1] prosms prosms LOG: 00000: duration: 3938.979 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:58:07.149830+03:00 s10 postgres[46220]: [1892-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:58:07.150067+03:00 s10 postgres[46220]: [1892-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T16:01:33.784422+03:00 s10 postgres[58002]: [2572-1] prosms prosms LOG: 00000: duration: 2921.212 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T16:01:33.784842+03:00 s10 postgres[58002]: [2572-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '4'
2014-05-06T16:01:33.785037+03:00 s10 postgres[58002]: [2572-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:46.303880+03:00 s10 postgres[46220]: [1891-1] prosms prosms LOG: 00000: duration: 1947.172 ms execute <unnamed>: select easysms_jb_pay($1,
$2)
2014-05-06T15:57:46.304230+03:00 s10 postgres[46220]: [1891-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:46.304439+03:00 s10 postgres[46220]: [1891-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:56.199005+03:00 s10 postgres[58002]: [2570-1] prosms prosms LOG: 00000: duration: 6869.886 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:57:56.199349+03:00 s10 postgres[58002]: [2570-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:56.199567+03:00 s10 postgres[58002]: [2570-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:59.134982+03:00 s10 postgres[58002]: [2571-1] prosms prosms LOG: 00000: duration: 1797.747 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:57:59.135334+03:00 s10 postgres[58002]: [2571-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:59.135562+03:00 s10 postgres[58002]: [2571-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:58:07.149477+03:00 s10 postgres[46220]: [1892-1] prosms prosms LOG: 00000: duration: 3938.979 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:58:07.149830+03:00 s10 postgres[46220]: [1892-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:58:07.150067+03:00 s10 postgres[46220]: [1892-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T16:01:33.784422+03:00 s10 postgres[58002]: [2572-1] prosms prosms LOG: 00000: duration: 2921.212 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T16:01:33.784842+03:00 s10 postgres[58002]: [2572-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '4'
2014-05-06T16:01:33.785037+03:00 s10 postgres[58002]: [2572-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
should you deem helpful yes i can enable full query logging and summon the listing for you.
a typical vmstat:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 363476 325852 294064 30520736 0 0 26 347 0 0 1 0 99 0 0
1 0 363476 328400 294068 30520732 0 0 0 28 4930 8014 6 1 94 0 0
2 0 363476 331756 294068 30520732 0 0 0 4384 4950 7980 6 1 93 0 0
0 0 363476 334016 294068 30520756 0 0 0 4384 4961 7981 7 1 92 0 0
0 0 363476 334700 294068 30520756 0 0 0 4424 4012 6467 4 1 95 0 0
1 0 363476 330852 294068 30520788 0 0 0 0 2559 3861 5 1 95 0 0
1 0 363476 331316 294072 30520788 0 0 0 4408 5013 8127 6 1 94 0 0
1 0 363476 330788 294072 30520788 0 0 0 4384 5535 9055 6 1 93 0 0
0 0 363476 331496 294072 30520804 0 0 0 4384 5031 8092 7 1 92 0 0
2 0 363476 331268 294072 30520804 0 0 0 4428 5052 8246 6 1 94 0 0
1 0 363476 330848 294080 30520812 0 0 0 32 4892 7950 5 1 94 0 0
1 0 363476 330480 294080 30520812 0 0 0 4388 4935 8036 6 1 94 0 0
2 0 363476 332616 294084 30521092 0 0 0 4408 5064 8194 6 1 93 0 0
0 0 363476 333596 294084 30521008 0 0 0 4384 5205 8463 8 1 91 0 0
1 0 363476 333324 294084 30521008 0 0 0 40 5014 8151 6 1 94 0 0
0 0 363476 332740 294084 30521016 0 0 0 4384 5047 8163 6 1 93 0 0
1 0 363476 336052 294084 30520888 0 0 0 4384 4849 7780 6 1 94 0 0
1 0 363476 334732 294088 30520892 0 0 8 4400 5520 9012 6 1 93 0 0
0 0 363476 334064 294096 30520884 0 0 0 220 3903 6193 6 1 94 0 0
0 0 363476 333124 294096 30520916 0 0 0 2232 4088 6462 6 1 93 0 0
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 363476 325852 294064 30520736 0 0 26 347 0 0 1 0 99 0 0
1 0 363476 328400 294068 30520732 0 0 0 28 4930 8014 6 1 94 0 0
2 0 363476 331756 294068 30520732 0 0 0 4384 4950 7980 6 1 93 0 0
0 0 363476 334016 294068 30520756 0 0 0 4384 4961 7981 7 1 92 0 0
0 0 363476 334700 294068 30520756 0 0 0 4424 4012 6467 4 1 95 0 0
1 0 363476 330852 294068 30520788 0 0 0 0 2559 3861 5 1 95 0 0
1 0 363476 331316 294072 30520788 0 0 0 4408 5013 8127 6 1 94 0 0
1 0 363476 330788 294072 30520788 0 0 0 4384 5535 9055 6 1 93 0 0
0 0 363476 331496 294072 30520804 0 0 0 4384 5031 8092 7 1 92 0 0
2 0 363476 331268 294072 30520804 0 0 0 4428 5052 8246 6 1 94 0 0
1 0 363476 330848 294080 30520812 0 0 0 32 4892 7950 5 1 94 0 0
1 0 363476 330480 294080 30520812 0 0 0 4388 4935 8036 6 1 94 0 0
2 0 363476 332616 294084 30521092 0 0 0 4408 5064 8194 6 1 93 0 0
0 0 363476 333596 294084 30521008 0 0 0 4384 5205 8463 8 1 91 0 0
1 0 363476 333324 294084 30521008 0 0 0 40 5014 8151 6 1 94 0 0
0 0 363476 332740 294084 30521016 0 0 0 4384 5047 8163 6 1 93 0 0
1 0 363476 336052 294084 30520888 0 0 0 4384 4849 7780 6 1 94 0 0
1 0 363476 334732 294088 30520892 0 0 8 4400 5520 9012 6 1 93 0 0
0 0 363476 334064 294096 30520884 0 0 0 220 3903 6193 6 1 94 0 0
0 0 363476 333124 294096 30520916 0 0 0 2232 4088 6462 6 1 93 0 0
the process that is constantly writing the majority of data is:
"postgres: stats collector process"
and varies from 2.5mb/sec up to 5.7mb/sec
all the other postgres (and non-postgres) processes write very little data and rarely.
and varies from 2.5mb/sec up to 5.7mb/sec
all the other postgres (and non-postgres) processes write very little data and rarely.
the
checkpointer process is like 78.6kb/sec (a few seconds now as i write
this email but no other is having a constant rate or I/O)
also,
_while having the problem_ the results of the following queries are
(taken from http://wiki.postgresql.org/wiki/Lock_Monitoring):
SELECT relation::regclass, * FROM pg_locks WHERE NOT granted;
(0 rows)
SELECT bl.pid AS blocked_pid, a.usename AS blocked_user, ka.query AS blocking_statement, now() - ka.query_start AS blocking_duration, kl.pid AS blocking_pid, ka.usename AS blocking_user, a.query AS blocked_statement, now() - a.query_start AS blocked_duration FROM pg_catalog.pg_locks bl JOIN pg_catalog.pg_stat_activity a ON a.pid = bl.pid JOIN pg_catalog.pg_locks kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid WHERE NOT bl.granted;
(0 rows)
SELECT waiting.locktype AS waiting_locktype, waiting.relation::regclass AS waiting_table, waiting_stm.query AS waiting_query, waiting.mode AS waiting_mode, waiting.pid AS waiting_pid, other.locktype AS other_locktype, other.relation::regclass AS other_table, other_stm.query AS other_query, other.mode AS other_mode, other.pid AS other_pid, other.granted AS other_granted FROM pg_catalog.pg_locks AS waiting JOIN pg_catalog.pg_stat_activity AS waiting_stm ON ( waiting_stm.pid = waiting.pid ) JOIN pg_catalog.pg_locks AS other ON ( ( waiting."database" = other."database" AND waiting.relation = other.relation ) OR waiting.transactionid = other.transactionid ) JOIN pg_catalog.pg_stat_activity AS other_stm ON ( other_stm.pid = other.pid ) WHERE NOT waiting.granted AND waiting.pid <> other.pid;
(0 rows)
WITH RECURSIVE c(requested, current) AS ( VALUES ('AccessShareLock'::text, 'AccessExclusiveLock'::text), ('RowShareLock'::text, 'ExclusiveLock'::text), ('RowShareLock'::text, 'AccessExclusiveLock'::text), ('RowExclusiveLock'::text, 'ShareLock'::text), ('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('RowExclusiveLock'::text, 'ExclusiveLock'::text), ('RowExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ShareLock'::text, 'RowExclusiveLock'::text), ('ShareLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareLock'::text, 'ShareRowExclusiveLock'::text), ('ShareLock'::text, 'ExclusiveLock'::text), ('ShareLock'::text, 'AccessExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ShareLock'::text), ('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ExclusiveLock'::text, 'RowShareLock'::text), ('ExclusiveLock'::text, 'RowExclusiveLock'::text), ('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ExclusiveLock'::text, 'ShareLock'::text), ('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ExclusiveLock'::text, 'ExclusiveLock'::text), ('ExclusiveLock'::text, 'AccessExclusiveLock'::text), ('AccessExclusiveLock'::text, 'AccessShareLock'::text), ('AccessExclusiveLock'::text, 'RowShareLock'::text), ('AccessExclusiveLock'::text, 'RowExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ShareLock'::text), ('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ExclusiveLock'::text), ('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text) ), l AS ( SELECT (locktype,DATABASE,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid) AS target, virtualtransaction, pid, mode, granted FROM pg_catalog.pg_locks ), t AS ( SELECT blocker.target AS blocker_target, blocker.pid AS blocker_pid, blocker.mode AS blocker_mode, blocked.target AS target, blocked.pid AS pid, blocked.mode AS mode FROM l blocker JOIN l blocked ON ( NOT blocked.granted AND blocker.granted AND blocked.pid != blocker.pid AND blocked.target IS NOT DISTINCT FROM blocker.target) JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode) ), r AS ( SELECT blocker_target, blocker_pid, blocker_mode, '1'::int AS depth, target, pid, mode, blocker_pid::text || ',' || pid::text AS seq FROM t UNION ALL SELECT blocker.blocker_target, blocker.blocker_pid, blocker.blocker_mode, blocker.depth + 1, blocked.target, blocked.pid, blocked.mode, blocker.seq || ',' || blocked.pid::text FROM r blocker JOIN t blocked ON (blocked.blocker_pid = blocker.pid) WHERE blocker.depth < 1000 ) SELECT * FROM r ORDER BY seq;
(0 rows)
no finding here either :(
best,
/mstelios
Stelios Mavromichalis
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
On Tue, May 6, 2014 at 2:54 AM, David G Johnston <david.g.johnston@xxxxxxxxx> wrote:
Stelios Mavromichalis wrote
> as a prior step to dump/restore i am thinking of deleting and re-insertingI still dislike the randomness of the unresponsiveness...
> that particular row. that might share some light you think?
Every time you perform an update you "delete and insert" that row - that is
how an update works in MVCC - so doing so explicitly is unlikely to provide
any benefit. Since that row is continually being inserted, and no other
rows are having this issue, I'm seriously doubting that a dump/restore is
going to have any effect either. Note that the index scan took twice as
long in the bad case - but still reasonable and you didn't notice any
benefit from a REINDEX. This is what I would expect.
The only other difference, if concurrency has been ruled out, is the 4 vs 18
buffers that had to be read. I cannot imagine that, since all 22 were in
cache, that simply reading that much more data would account for the
difference (we're talking a 10,000-fold increase, not 2to4-fold). The
reason for this particular difference, IIUC, is how may candidate tuples are
present whose visibility has to be accounted for (assuming 1 buffer per
tuple, you needed to scan 4 vs 18 for visibility in the two queries).
Is there any log file information you can share? Especially if you can set
log_min_statement_duration (or whatever that GUC is named) so that whenever
one of these gets adversely delayed it appears in the log along with
whatever other system messages are being sent. Checkpoints are a typical
culprit though that should be affecting a great deal more than what you
indicate you are seeing.
I'm pretty certain you are seeing this here largely because of the frequency
of activity on this particular user; not because the data itself is
corrupted. It could be some kind of symptom of internal concurrency that
you just haven't observed yet but it could also be I/O or other system
contention that you also haven't properly instrumented. Unfortunately that
is beyond my current help-providing skill-set.
A dump-restore likely would not make anything worse though I'd be surprised
if it were to improve matters. It also doesn't seem like hardware - unless
the RAM is bad. Software bugs are unlikely if this had been working well
before 5 days ago. So, you need to observe the system during both periods
(good and bad) and observe something that is different - probably not within
PostgreSQL if indeed you've minimized concurrency. And also see if you can
see if any other queries, executed during both these times, exhibit a
performance decrease. Logging all statements would help matters greatly if
you can afford it in your production environment - it would make looking for
internal concurrency much easier.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-recently-and-selectively-slow-but-very-simple-update-query-tp5802553p5802579.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance