BEAUTIFUL!!!
Using the Patch I can now go upto 1300 users without dropping.. But now
it still repeats at 1300-1350 users..
I corrected the Lock Descriptions based on what I got from lwlock.h and
retried the whole scalability again with profiling again.. This time it
looks like the ProcArrayLock
bash-3.00# echo 600 users
600 users
bash-3.00# ./4_lwlock_waits.d 7056
Lock Id Mode Count
XidGenLock Exclusive 1
CLogControlLock Shared 2
XidGenLock Shared 2
WALWriteLock Exclusive 4
WALInsertLock Exclusive 8
CLogControlLock Exclusive 9
ProcArrayLock Exclusive 9
Lock Id Combined Time (ns)
WALWriteLock 2842300
XidGenLock 4951000
CLogControlLock 11151800
WALInsertLock 13035600
ProcArrayLock 20040000
bash-3.00# echo 700 users
700 users
bash-3.00# ./4_lwlock_waits.d 7056
Lock Id Mode Count
XidGenLock Exclusive 1
WALWriteLock Exclusive 2
XidGenLock Shared 2
CLogControlLock Shared 3
CLogControlLock Exclusive 8
WALInsertLock Exclusive 9
ProcArrayLock Exclusive 22
Lock Id Combined Time (ns)
XidGenLock 4093300
WALWriteLock 4914800
WALInsertLock 7389100
ProcArrayLock 10248200
CLogControlLock 11989400
bash-3.00# echo 800 users
800 users
bash-3.00# ./4_lwlock_waits.d 7056
Lock Id Mode Count
WALWriteLock Exclusive 1
XidGenLock Shared 2
CLogControlLock Shared 3
CLogControlLock Exclusive 7
WALInsertLock Exclusive 7
ProcArrayLock Exclusive 31
Lock Id Combined Time (ns)
WALWriteLock 319100
XidGenLock 5388700
WALInsertLock 9901400
CLogControlLock 13465000
ProcArrayLock 42979700
bash-3.00# echo 900 users
900 users
bash-3.00# ./4_lwlock_waits.d 7056
Lock Id Mode Count
CLogControlLock Shared 1
XidGenLock Exclusive 1
WALWriteLock Exclusive 2
CLogControlLock Exclusive 6
WALInsertLock Exclusive 9
ProcArrayLock Exclusive 25
Lock Id Combined Time (ns)
XidGenLock 3197700
WALWriteLock 3887100
CLogControlLock 15774500
WALInsertLock 38268700
ProcArrayLock 162073100
bash-3.00# ./6_lwlock_stack.d 4 7056
Lock Id Mode Count
ProcArrayLock Shared 1
ProcArrayLock Exclusive 67
Lock Id Combined Time (ns)
ProcArrayLock 216773800
Lock Id Combined Time (ns)
postgres`LWLockAcquire+0x1c8
postgres`GetSnapshotData+0x118
postgres`GetTransactionSnapshot+0x5c
postgres`PortalStart+0x150
postgres`exec_bind_message+0x81c
postgres`PostgresMain+0x12b8
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
2779000
postgres`LWLockAcquire+0x1c8
postgres`CommitTransaction+0xe0
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
213994800
bash-3.00# echo 1000 users - SLIGHT DROP
1000 users
bash-3.00# ./4_lwlock_waits.d 7056
Lock Id Mode Count
WALWriteLock Exclusive 1
CLogControlLock Exclusive 2
XidGenLock Shared 2
CLogControlLock Shared 3
WALInsertLock Exclusive 4
ProcArrayLock Exclusive 26
Lock Id Combined Time (ns)
WALWriteLock 1807400
XidGenLock 2024000
WALInsertLock 2177500
CLogControlLock 9064700
ProcArrayLock 199216000
bash-3.00# ./6_lwlock_stack.d 4 7056
Lock Id Mode Count
ProcArrayLock Shared 3
ProcArrayLock Exclusive 38
Lock Id Combined Time (ns)
ProcArrayLock 858238600
Lock Id Combined Time (ns)
postgres`LWLockAcquire+0x1c8
postgres`TransactionIdIsInProgress+0x50
postgres`HeapTupleSatisfiesVacuum+0x2ec
postgres`_bt_check_unique+0x2a0
postgres`_bt_doinsert+0x98
postgres`btinsert+0x54
postgres`FunctionCall6+0x44
postgres`index_insert+0x90
postgres`ExecInsertIndexTuples+0x1bc
postgres`ExecUpdate+0x500
postgres`ExecutePlan+0x704
postgres`ExecutorRun+0x60
postgres`PortalRunMulti+0x2a0
postgres`PortalRun+0x310
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
167100
postgres`LWLockAcquire+0x1c8
postgres`GetSnapshotData+0x118
postgres`GetTransactionSnapshot+0x5c
postgres`PortalRunMulti+0x22c
postgres`PortalRun+0x310
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
7125900
postgres`LWLockAcquire+0x1c8
postgres`CommitTransaction+0xe0
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
850945600
bash-3.00# echo 1100 users - DROP ....
1100 users
bash-3.00# ./4_lwlock_waits.d 7056
Lock Id Mode Count
CLogControlLock Shared 1
WALWriteLock Exclusive 1
XidGenLock Exclusive 1
ProcArrayLock Shared 2
WALInsertLock Exclusive 2
XidGenLock Shared 2
CLogControlLock Exclusive 4
ProcArrayLock Exclusive 20
Lock Id Combined Time (ns)
WALWriteLock 4179500
XidGenLock 6249400
CLogControlLock 20411100
WALInsertLock 29707600
ProcArrayLock 207923700
bash-3.00# ./6_lwlock_stack.d 4 7056
Lock Id Mode Count
ProcArrayLock Exclusive 40
Lock Id Combined Time (ns)
ProcArrayLock 692242100
Lock Id Combined Time (ns)
postgres`LWLockAcquire+0x1c8
postgres`CommitTransaction+0xe0
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
692242100
bash-3.00#
LockID for ProcArrayLock is 4 or the 5 entry in lwlock.h which seems to
indicate it is lwlock.h
Any tweaks for that?
-Jignesh
Simon Riggs wrote:
On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:
However at 900 Users where the big drop in throughput occurs:
It gives a different top "consumer" of time:
postgres`LWLockAcquire+0x1c8
postgres`SimpleLruReadPage+0x1ac
postgres`TransactionIdGetStatus+0x14
postgres`TransactionLogFetch+0x58
TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
Presumably the compiler has been rearranging things??
Looks like you're out of clog buffers. It seems like the clog buffers
aren't big enough to hold clog pages for long enough and the SELECT FOR
SHARE processing is leaving lots of additional read locks that are
increasing the number of clog requests for older xids.
Try the enclosed patch.
------------------------------------------------------------------------
Index: src/include/access/clog.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/access/clog.h,v
retrieving revision 1.19
diff -c -r1.19 clog.h
*** src/include/access/clog.h 5 Jan 2007 22:19:50 -0000 1.19
--- src/include/access/clog.h 26 Jul 2007 15:44:58 -0000
***************
*** 29,35 ****
/* Number of SLRU buffers to use for clog */
! #define NUM_CLOG_BUFFERS 8
extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
--- 29,35 ----
/* Number of SLRU buffers to use for clog */
! #define NUM_CLOG_BUFFERS 64
extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
------------------------------------------------------------------------
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate