Re: SSI slows down over time

Ryan Johnson <ryan.johnson@xxxxxxxxxxxxxx> · Mon, 07 Apr 2014 10:36:14 -0400



    On 06/04/2014 10:55 AM, Tom Lane wrote:

    
      Ryan Johnson <ryan.johnson@xxxxxxxxxxxxxx> writes:

      
        I get a strange behavior across repeated runs: each 100-second run is a 
bit slower than the one preceding it, when run with SSI (SERIALIZABLE). 
... So the question: what should I look for to diagnose/triage this problem? 

      
      In the past I've seen behaviors like this that traced to the range of
"interesting" transaction IDs getting wider as time went on, so that
more pages of pg_clog were hot, leading to more I/O traffic in the
clog SLRU buffers.  Possibly this is some effect like that.
    
    The effect disappears completely if I run under SI instead of SSI,
    though. That makes me suspect strongly that the problem lurks in
    SSI-specific infrastrucure.
    

    However, I did notice that the SLRU buffer that holds "old" SSI
    transactions sometimes spikes from hundreds to millions of entries
    (by annotating the source to ereport a warning whenever the
    difference between head and tail is at least 10% higher than the
    previous record). Not sure if that's related, though: I'm pretty
    sure SSI never scans SLRU, it's only used for random lookups.

    
        I've tried linux perf, but all it says is that lots of time is going to 
LWLock (but callgraph tracing doesn't work in my not-bleeding-edge 
kernel).

      
      You could recompile with -DLWLOCK_STATS to enable gathering stats on
which LWLocks are heavily contended.  That would be a starting point
for localizing the cause.
    
    
    Here are the offenders (100-second run, 24 clients, ~2.2ktps):
    

    lwlock        7
      shacq         0 exacq 7002810 blk  896235 spindelay  213
      

      lwlock       28 shacq  94984166 exacq   3938085 blk 572762
      spindelay  163
      

      lwlock       65 shacq   3347000 exacq   2933440 blk 255927
      spindelay   90
      

      lwlock       79 shacq   1742574 exacq   3009663 blk 216178
      spindelay   41
      

      lwlock       76 shacq   2293386 exacq   2892242 blk 205199
      spindelay   70
      

      lwlock       66 shacq    498909 exacq   2987485 blk 171508
      spindelay   48
      

      lwlock       80 shacq    512107 exacq   3181753 blk 165772
      spindelay   43
      

      lwlock       71 shacq    815733 exacq   3088157 blk 165579
      spindelay   48
      

      lwlock       74 shacq    603321 exacq   3065391 blk 159953
      spindelay   56
      

      lwlock       67 shacq    695465 exacq   2918970 blk 149339
      spindelay   28
      

      lwlock       69 shacq    411203 exacq   3044007 blk 148655
      spindelay   34
      

      lwlock       72 shacq    515260 exacq   2973321 blk 147533
      spindelay   43
      

      lwlock       30 shacq  41628636 exacq      8799 blk 143889
      spindelay  186
      

      lwlock       75 shacq    409472 exacq   2987227 blk 143196
      spindelay   38
      

      lwlock       77 shacq    409401 exacq   2946972 blk 139507
      spindelay   34
      

      lwlock       73 shacq    402544 exacq   2943467 blk 139380
      spindelay   43
      

      lwlock       78 shacq    404220 exacq   2912665 blk 137625
      spindelay   21
      

      lwlock       70 shacq    603643 exacq   2816730 blk 135851
      spindelay   37
      

      lwlock       68 shacq    403533 exacq   2862017 blk 131946
      spindelay   30
      

      lwlock       29 shacq         0 exacq    255302 blk 75838
      spindelay    1
      

      lwlock        0 shacq         0 exacq    561508 blk 12445
      spindelay    3
      

      lwlock       11 shacq   1245499 exacq    219717 blk 5501
      spindelay   10
      

      lwlock        4 shacq    381211 exacq    209146 blk 1273
      spindelay    4
      

      lwlock        3 shacq        16 exacq    209081 blk 522
      spindelay    0
      

      lwlock        8 shacq         0 exacq    137961 blk 50
      spindelay    0
      

      lwlock  2097366 shacq         0 exacq    384586 blk 1 spindelay   
      0
      

      lwlock  2097365 shacq         0 exacq    370176 blk 1 spindelay   
      0
      

      lwlock  2097359 shacq         0 exacq    363845 blk 1 spindelay   
      0
      

    The above aggregates the per-lock stats from all processes, filters
    out locks with fewer than 10000 accesses (shared+exclusive) or with
    zero blk, then sorts by highest blk first.
    

    According to [1], locks {28, 29, 30} are {SerializableXactHashLock,
    SerializableFinishedListLock, SerializablePredicateLockListLock},
    all SSI-related; locks 65-80 are the sixteen PredicateLockMgrLocks
    that the post mentions. Looking in lwlock.h, lock 7 (which tops the
    list) is the WALInsertLock. That lock was *not*
    mentioned in the pgsql-hackers post.
    

    Re-running the same analysis for SI instead of SSI gives 4.6ktps and
    a much shorter list:
    

    lwlock        7
      shacq         0 exacq 14050121 blk  3429384 spindelay  347
      

      lwlock       11 shacq   3133994 exacq    450325 blk 23456
      spindelay   29
      

      lwlock        0 shacq         0 exacq    684775 blk 19158
      spindelay    3
      

      lwlock        4 shacq    780846 exacq    428771 blk 4539
      spindelay    6
      

      lwlock        3 shacq        19 exacq    428705 blk 1147
      spindelay    0
      

      lwlock       59 shacq         0 exacq    125943 blk 203
      spindelay    0
      

      lwlock        8 shacq         0 exacq    287629 blk 116
      spindelay    0
      

      lwlock  2097358 shacq         0 exacq    752361 blk 1 spindelay   
      0
      

      lwlock  2097355 shacq         0 exacq    755086 blk 1 spindelay   
      0
      

      lwlock  2097352 shacq         0 exacq    760232 blk 1 spindelay   
      0
      

    However, all of this only confirms that SSI has a lock bottleneck;
    it doesn't say why the bottleneck gets worse over time.
    

    [1] http://www.postgresql.org/message-id/CA+TgmoYAiSM2jWEndReY5PL0sKbhgg7dbDH6r=oXKYzi9B7KJA@xxxxxxxxxxxxxx
    

    Thoughts?
    

    Ryan