I will look for runs with longer samples..
Now the script could have mislabeled lock names.. Anyway digging into
the one that seems to increase over time... I did stack profiles on how
that increases... and here are some numbers..
For 600-850 Users: that potential mislabeled CheckPointStartLock or
LockID==12 comes from various sources where the top source (while system
is still doing great) comes from:
postgres`LWLockAcquire+0x1c8
postgres`SimpleLruReadPage_ReadOnly+0xc
postgres`TransactionIdGetStatus+0x14
postgres`TransactionLogFetch+0x58
postgres`TransactionIdDidCommit+0x4
postgres`HeapTupleSatisfiesSnapshot+0x234
postgres`heap_release_fetch+0x1a8
postgres`index_getnext+0xf4
postgres`IndexNext+0x7c
postgres`ExecScan+0x8c
postgres`ExecProcNode+0xb4
postgres`ExecutePlan+0xdc
postgres`ExecutorRun+0xb0
postgres`PortalRunSelect+0x9c
postgres`PortalRun+0x244
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
8202100
postgres`LWLockAcquire+0x1c8
postgres`TransactionIdSetStatus+0x1c
postgres`RecordTransactionCommit+0x2a8
postgres`CommitTransaction+0xc8
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
30822100
However at 900 Users where the big drop in throughput occurs:
It gives a different top "consumer" of time:
postgres`LWLockAcquire+0x1c8
postgres`TransactionIdSetStatus+0x1c
postgres`RecordTransactionCommit+0x2a8
postgres`CommitTransaction+0xc8
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
406601300
postgres`LWLockAcquire+0x1c8
postgres`SimpleLruReadPage+0x1ac
postgres`TransactionIdGetStatus+0x14
postgres`TransactionLogFetch+0x58
postgres`TransactionIdDidCommit+0x4
postgres`HeapTupleSatisfiesUpdate+0x360
postgres`heap_lock_tuple+0x27c
postgres`ExecutePlan+0x33c
postgres`ExecutorRun+0xb0
postgres`PortalRunSelect+0x9c
postgres`PortalRun+0x244
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
444523100
postgres`LWLockAcquire+0x1c8
postgres`SimpleLruReadPage+0x1ac
postgres`TransactionIdGetStatus+0x14
postgres`TransactionLogFetch+0x58
postgres`TransactionIdDidCommit+0x4
postgres`HeapTupleSatisfiesSnapshot+0x234
postgres`heap_release_fetch+0x1a8
postgres`index_getnext+0xf4
postgres`IndexNext+0x7c
postgres`ExecScan+0x8c
postgres`ExecProcNode+0xb4
postgres`ExecutePlan+0xdc
postgres`ExecutorRun+0xb0
postgres`PortalRunSelect+0x9c
postgres`PortalRun+0x244
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
1661300000
Maybe you all will understand more than I do on what it does here..
Looks like IndexNext has a problem at high number of users to me.. but I
could be wrong..
-Jignesh
Tom Lane wrote:
"Jignesh K. Shah" <J.K.Shah@xxxxxxx> writes:
The count is only for a 10-second snapshot.. Plus remember there are
about 1000 users running so the connection being profiled only gets
0.01 of the period on CPU.. And the count is for that CONNECTION only.
OK, that explains the low absolute levels of the counts, but if the
counts are for a regular backend then there's still a lot of bogosity
here. Backends won't be taking the CheckpointLock at all, nor do they
take CheckpointStartLock in exclusive mode. The bgwriter would do that
but it'd not be taking most of these other locks. So I think the script
is mislabeling the locks somehow.
Also, elementary statistics should tell you that a sample taken as above
is going to have enormous amounts of noise. You should be sampling over
a much longer period, say on the order of a minute of runtime, to have
numbers that are trustworthy.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq