Re: User concurrency thresholding: where do I look?

"Jignesh K. Shah" <J.K.Shah@xxxxxxx> · Fri, 27 Jul 2007 15:11:35 -0400

I tried CLOG Buffers 32 and the performance is as good as 64 bit.. (I 
havent tried 16 yet though.. ) I am going to try your second patch now..

Also here is the breakup by Mode. The combined time is the total time it 
waits for all counts.

            Lock Id            Mode           Count
      ProcArrayLock          Shared               1
    CLogControlLock       Exclusive               4
    CLogControlLock          Shared               4
         XidGenLock          Shared               4
         XidGenLock       Exclusive               7
      WALInsertLock       Exclusive              21
       WALWriteLock       Exclusive              62
      ProcArrayLock       Exclusive              79

            Lock Id            Mode    Combined Time (ns)
    CLogControlLock        Exclusive               325200
    CLogControlLock           Shared              4509200
         XidGenLock        Exclusive             11839600
      ProcArrayLock           Shared             40506600
         XidGenLock           Shared            119013700
      WALInsertLock        Exclusive            148063100
       WALWriteLock        Exclusive            347052100
      ProcArrayLock        Exclusive           1054780600

Here is another one at higher user count 1600:

bash-3.00# ./4_lwlock_waits.d 9208

            Lock Id            Mode           Count
    CLogControlLock       Exclusive               1
    CLogControlLock          Shared               2
         XidGenLock          Shared               7
      WALInsertLock       Exclusive              12
       WALWriteLock       Exclusive              50
      ProcArrayLock       Exclusive              82

            Lock Id            Mode   Combined Time (ns)
    CLogControlLock        Exclusive                27300
         XidGenLock           Shared             14689300
    CLogControlLock           Shared             72664900
      WALInsertLock        Exclusive            101431300
       WALWriteLock        Exclusive            534357400
      ProcArrayLock        Exclusive           4110350300

Now I will try with your second patch.

Regards,
Jignesh

Simon Riggs wrote:
On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote:

             Lock Id   Combined Time (ns)
          XidGenLock            194966200
       WALInsertLock            517955000
     CLogControlLock            679665100
        WALWriteLock           2838716200
       ProcArrayLock          44181002600

Is this the time the lock is held for or the time that we wait for that
lock? It would be good to see the break down of time separately for
shared and exclusive.

Can we have a table like this:
	LockId,LockMode,SumTimeLockHeld,SumTimeLockWait

Top Wait time   seems to come from the following code path for 
ProcArrayLock:

             Lock Id            Mode           Count
       ProcArrayLock       Exclusive              21

             Lock Id   Combined Time (ns)
       ProcArrayLock           5255937500

             Lock Id   Combined Time (ns)

              postgres`LWLockAcquire+0x1f0
              postgres`CommitTransaction+0x104
              postgres`CommitTransactionCommand+0xbc
              postgres`finish_xact_command+0x78

Well thats pretty weird. That code path clearly only happens once per
transaction and ought to be fast. The other code paths that take
ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
ought to spend more time holding the lock. Presumably you are running
with a fair number of SERIALIZABLE transactions? 

Are you running with commit_delay > 0? Its possible that the call to
CountActiveBackends() is causing pinging of the procarray by other
backends while we're trying to read it during CommitTransaction(). If
so, try the attached patch.

------------------------------------------------------------------------

Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.245
diff -c -r1.245 xact.c
*** src/backend/access/transam/xact.c	7 Jun 2007 21:45:58 -0000	1.245
--- src/backend/access/transam/xact.c	27 Jul 2007 09:09:08 -0000
***************
*** 820,827 ****
  			 * are fewer than CommitSiblings other backends with active
  			 * transactions.
  			 */
! 			if (CommitDelay > 0 && enableFsync &&
! 				CountActiveBackends() >= CommitSiblings)
  				pg_usleep(CommitDelay);

  			XLogFlush(recptr);
--- 820,826 ----
  			 * are fewer than CommitSiblings other backends with active
  			 * transactions.
  			 */
! 			if (CommitDelay > 0 && enableFsync)
  				pg_usleep(CommitDelay);

  			XLogFlush(recptr);

------------------------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq