Redid the test with - waking up all waiters irrespective of shared,
exclusive
480: 64: Medium Throughput: 66688.000 Avg Medium Resp: 0.005
540: 72: Medium Throughput: 74355.000 Avg Medium Resp: 0.005
600: 80: Medium Throughput: 82920.000 Avg Medium Resp: 0.005
660: 88: Medium Throughput: 91466.000 Avg Medium Resp: 0.005
720: 96: Medium Throughput: 98749.000 Avg Medium Resp: 0.006
780: 104: Medium Throughput: 107365.000 Avg Medium Resp: 0.006
840: 112: Medium Throughput: 114121.000 Avg Medium Resp: 0.006
900: 120: Medium Throughput: 119556.000 Avg Medium Resp: 0.006
960: 128: Medium Throughput: 128544.000 Avg Medium Resp: 0.006
1020: 136: Medium Throughput: 134725.000 Avg Medium Resp: 0.007
1080: 144: Medium Throughput: 138817.000 Avg Medium Resp: 0.007
1140: 152: Medium Throughput: 141482.000 Avg Medium Resp: 0.008
1200: 160: Medium Throughput: 149430.000 Avg Medium Resp: 0.008
1260: 168: Medium Throughput: 145104.000 Avg Medium Resp: 0.009
1320: 176: Medium Throughput: 143059.000 Avg Medium Resp: 0.011
1380: 184: Medium Throughput: 147687.000 Avg Medium Resp: 0.011
light: customer: No result set for custid 0
1440: 192: Medium Throughput: 148081.000 Avg Medium Resp: 0.013
light: customer: No result set for custid 0
1500: 200: Medium Throughput: 145452.000 Avg Medium Resp: 0.014
1560: 208: Medium Throughput: 146057.000 Avg Medium Resp: 0.015
1620: 216: Medium Throughput: 148456.000 Avg Medium Resp: 0.016
1680: 224: Medium Throughput: 153088.000 Avg Medium Resp: 0.016
1740: 232: Medium Throughput: 151263.000 Avg Medium Resp: 0.017
1800: 240: Medium Throughput: 154146.000 Avg Medium Resp: 0.017
1860: 248: Medium Throughput: 155520.000 Avg Medium Resp: 0.018
1920: 256: Medium Throughput: 154696.000 Avg Medium Resp: 0.019
1980: 264: Medium Throughput: 155391.000 Avg Medium Resp: 0.020
light: customer: No result set for custid 0
2040: 272: Medium Throughput: 156086.000 Avg Medium Resp: 0.021
2100: 280: Medium Throughput: 150085.000 Avg Medium Resp: 0.023
2160: 288: Medium Throughput: 152253.000 Avg Medium Resp: 0.024
2220: 296: Medium Throughput: 155203.000 Avg Medium Resp: 0.025
2280: 304: Medium Throughput: 157962.000 Avg Medium Resp: 0.025
light: customer: No result set for custid 0
2340: 312: Medium Throughput: 157270.000 Avg Medium Resp: 0.026
2400: 320: Medium Throughput: 161298.000 Avg Medium Resp: 0.027
2460: 328: Medium Throughput: 161527.000 Avg Medium Resp: 0.028
2520: 336: Medium Throughput: 163569.000 Avg Medium Resp: 0.028
2580: 344: Medium Throughput: 166190.000 Avg Medium Resp: 0.028
2640: 352: Medium Throughput: 168516.000 Avg Medium Resp: 0.029
2700: 360: Medium Throughput: 171417.000 Avg Medium Resp: 0.029
2760: 368: Medium Throughput: 173350.000 Avg Medium Resp: 0.029
2820: 376: Medium Throughput: 155672.000 Avg Medium Resp: 0.035
2880: 384: Medium Throughput: 172821.000 Avg Medium Resp: 0.031
2940: 392: Medium Throughput: 171819.000 Avg Medium Resp: 0.033
3000: 400: Medium Throughput: 171388.000 Avg Medium Resp: 0.033
3060: 408: Medium Throughput: 172949.000 Avg Medium Resp: 0.034
3120: 416: Medium Throughput: 172638.000 Avg Medium Resp: 0.036
3180: 424: Medium Throughput: 172310.000 Avg Medium Resp: 0.036
(My timed test made it end here..)
vmstat seems similar to wakeup some
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs us
sy id
63 0 0 45535728 38689856 0 14 0 0 0 0 0 0 0 0 0 163318 334225
360179 47 17 36
85 0 0 45436736 38690760 0 6 0 0 0 0 0 0 0 0 0 165536 347462
365987 47 17 36
59 0 0 45405184 38681752 0 11 0 0 0 0 0 0 0 0 0 155153 326182
345527 47 16 37
53 0 0 45393816 38673344 0 6 0 0 0 0 0 0 0 0 0 152752 317851
340737 47 16 37
66 0 0 45378312 38651920 0 11 0 0 0 0 0 0 0 0 0 150979 304350
336915 47 16 38
67 0 0 45489520 38639664 0 5 0 0 0 0 0 0 0 0 0 157188 318958
351905 47 16 37
82 0 0 45483600 38633344 0 10 0 0 0 0 0 0 0 0 0 168797 348619
375827 47 17 36
68 0 0 45463008 38614432 0 9 0 0 0 0 0 0 0 0 0 173020 376594
385370 47 18 35
54 0 0 45451376 38603792 0 13 0 0 0 0 0 0 0 0 0 161891 342522
364286 48 17 35
41 0 0 45356544 38605976 0 5 0 0 0 0 0 0 0 0 0 167250 358320
372469 47 17 36
27 0 0 45323472 38596952 0 11 0 0 0 0 0 0 0 0 0 165099 344695
364256 48 17 35
missed taking mpstat
also dtrace shows that "Waiting" for procarray is not the most expensive
wait.
-bash-3.2# ./84_lwlock.d 17071
Lock Id Mode State Count
CLogControlLock Shared Waiting 4
CLogControlLock Exclusive Waiting 32
ProcArrayLock Shared Waiting 35
CLogControlLock Shared Acquired 47
WALInsertLock Exclusive Waiting 53
ProcArrayLock Exclusive Waiting 104
XidGenLock Exclusive Acquired 116
ProcArrayLock Exclusive Acquired 117
CLogControlLock Exclusive Acquired 176
WALInsertLock Exclusive Acquired 370
FirstLockMgrLock Shared Acquired 793
FirstBufMappingLock Shared Acquired 799
ProcArrayLock Shared Acquired 882
SInvalReadLock Shared Acquired 1827
Lock Id Mode State Combined Time (ns)
WALInsertLock Acquired 52915
CLogControlLock Acquired 78332
XidGenLock Acquired 103026
FirstLockMgrLock Acquired 392836
FirstBufMappingLock Acquired 2919896
CLogControlLock Shared Waiting 5342211
CLogControlLock Exclusive Waiting 9172692
ProcArrayLock Shared Waiting 18186546
ProcArrayLock Acquired 22478607
XidGenLock Exclusive Acquired 26561444
SInvalReadLock Acquired 29012891
CLogControlLock Exclusive Acquired 30490159
WALInsertLock Exclusive Waiting 35055294
FirstLockMgrLock Exclusive Acquired 47077668
FirstBufMappingLock Exclusive Acquired 47460381
WALInsertLock Exclusive Acquired 99288648
ProcArrayLock Exclusive Waiting 104221100
ProcArrayLock Exclusive Acquired 356644807
SInvalReadLock Exclusive Acquired 357530794
So clearly even waking up some more exclusives than just 1 seems to help
scalability improve (though actual improvement mileage varies but there
is some positive improvement).
One more change that I can think of doing is a minor change where we
wake all sequential shared waiters but only 1 exclusive waiter.. I am
going to change that to ... whatever sequential you get wake them all
up.. so in essense it does a similar heart valve type approach of doing
little bursts rather than tie them to 1 exclusive only.
-Jignesh
Jignesh K. Shah wrote:
Now with a modified Fix (not the original one that I proposed but
something that works like a heart valve : Opens and shuts to minimum
default way thus controlling how many waiters are waked up )
Time:Users:throughput: Reponse
60: 8: Medium Throughput: 7774.000 Avg Medium Resp: 0.004
120: 16: Medium Throughput: 16874.000 Avg Medium Resp: 0.004
180: 24: Medium Throughput: 25159.000 Avg Medium Resp: 0.004
240: 32: Medium Throughput: 33216.000 Avg Medium Resp: 0.005
300: 40: Medium Throughput: 42418.000 Avg Medium Resp: 0.005
360: 48: Medium Throughput: 49655.000 Avg Medium Resp: 0.005
420: 56: Medium Throughput: 58149.000 Avg Medium Resp: 0.005
480: 64: Medium Throughput: 66558.000 Avg Medium Resp: 0.005
540: 72: Medium Throughput: 74474.000 Avg Medium Resp: 0.005
600: 80: Medium Throughput: 82241.000 Avg Medium Resp: 0.005
660: 88: Medium Throughput: 90336.000 Avg Medium Resp: 0.005
720: 96: Medium Throughput: 99101.000 Avg Medium Resp: 0.006
780: 104: Medium Throughput: 106028.000 Avg Medium Resp: 0.006
840: 112: Medium Throughput: 113196.000 Avg Medium Resp: 0.006
900: 120: Medium Throughput: 119174.000 Avg Medium Resp: 0.006
960: 128: Medium Throughput: 129408.000 Avg Medium Resp: 0.006
1020: 136: Medium Throughput: 134433.000 Avg Medium Resp: 0.007
1080: 144: Medium Throughput: 143121.000 Avg Medium Resp: 0.007
1140: 152: Medium Throughput: 144603.000 Avg Medium Resp: 0.007
1200: 160: Medium Throughput: 148604.000 Avg Medium Resp: 0.008
1260: 168: Medium Throughput: 150274.000 Avg Medium Resp: 0.009
1320: 176: Medium Throughput: 150581.000 Avg Medium Resp: 0.010
1380: 184: Medium Throughput: 146912.000 Avg Medium Resp: 0.012
1440: 192: Medium Throughput: 143945.000 Avg Medium Resp: 0.013
1500: 200: Medium Throughput: 144029.000 Avg Medium Resp: 0.015
1560: 208: Medium Throughput: 143468.000 Avg Medium Resp: 0.016
1620: 216: Medium Throughput: 144367.000 Avg Medium Resp: 0.017
1680: 224: Medium Throughput: 148340.000 Avg Medium Resp: 0.017
1740: 232: Medium Throughput: 148842.000 Avg Medium Resp: 0.018
1800: 240: Medium Throughput: 149533.000 Avg Medium Resp: 0.019
1860: 248: Medium Throughput: 152334.000 Avg Medium Resp: 0.019
1920: 256: Medium Throughput: 151521.000 Avg Medium Resp: 0.020
1980: 264: Medium Throughput: 148961.000 Avg Medium Resp: 0.022
2040: 272: Medium Throughput: 151270.000 Avg Medium Resp: 0.022
2100: 280: Medium Throughput: 149783.000 Avg Medium Resp: 0.024
2160: 288: Medium Throughput: 151743.000 Avg Medium Resp: 0.024
2220: 296: Medium Throughput: 155190.000 Avg Medium Resp: 0.026
2280: 304: Medium Throughput: 150955.000 Avg Medium Resp: 0.027
2340: 312: Medium Throughput: 147118.000 Avg Medium Resp: 0.029
2400: 320: Medium Throughput: 152768.000 Avg Medium Resp: 0.029
2460: 328: Medium Throughput: 161044.000 Avg Medium Resp: 0.028
2520: 336: Medium Throughput: 157926.000 Avg Medium Resp: 0.029
2580: 344: Medium Throughput: 161005.000 Avg Medium Resp: 0.029
2640: 352: Medium Throughput: 167274.000 Avg Medium Resp: 0.029
2700: 360: Medium Throughput: 168253.000 Avg Medium Resp: 0.031
With final vmstats improving but still far from 100%
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs
us sy id
38 0 0 46052840 39345096 0 11 0 0 0 0 0 0 0 0 0 134137 290703
303518 40 14 45
43 0 0 45656456 38882912 23 77 0 0 0 0 0 0 0 0 0 135820 272899
300749 40 15 45
38 0 0 45650488 38816984 23 80 0 0 0 0 0 0 0 0 0 135009 272767
300192 39 15 46
47 0 0 46020792 39187688 0 5 0 0 0 0 0 0 0 0 0 140473 285445
312826 40 14 46
24 0 0 46143984 39326848 9 61 0 0 0 0 0 0 0 0 0 146194 308590
328241 40 15 45
37 0 0 45465256 38757000 22 74 0 0 0 0 0 0 0 0 0 136835 293971
301433 38 14 48
35 0 0 46017544 39308072 12 61 0 0 0 0 0 0 0 0 0 142749 312355
320592 42 15 43
36 0 0 45456000 38744688 11 24 0 0 0 0 0 0 0 0 0 143566 303461
317683 41 15 43
23 0 0 46007408 39291312 2 22 0 0 0 0 0 0 0 0 0 140246 300061
316663 42 15 43
20 0 0 46029656 39281704 10 25 0 0 0 0 0 0 0 0 0 147787 291825
326387 43 15 42
24 0 0 46131016 39288528 2 21 0 0 0 0 0 0 0 0 0 150796 310697
335791 43 15 42
20 0 0 46109448 39269392 16 67 0 0 0 0 0 0 0 0 0 150075 315517
332881 43 16 41
30 0 0 45540928 38710376 9 27 0 0 0 0 0 0 0 0 0 155214 316448
341472 43 16 40
14 0 0 45987496 39270016 0 5 0 0 0 0 0 0 0 0 0 155028 333711
344207 44 16 40
25 0 0 45981136 39263008 0 10 0 0 0 0 0 0 0 0 0 153968 327343
343776 45 16 39
54 0 0 46062984 39259936 0 7 0 0 0 0 0 0 0 0 0 153721 315839
344732 45 16 39
42 0 0 46099704 39252920 0 15 0 0 0 0 0 0 0 0 0 154629 323125
348798 45 16 39
54 0 0 46068944 39230808 0 8 0 0 0 0 0 0 0 0 0 157166 340265
354135 46 17 37
But the real winner shows up in lockstat where it seems to indicate
that stress on Waiting from ProcArrayLock is relieved (thought
shifting somewhere else which is how lock works):
# ./84_lwlock.d 8042
Lock Id Mode State Count
WALWriteLock Exclusive Acquired 1
XidGenLock Exclusive Waiting 3
CLogControlLock Shared Waiting 11
ProcArrayLock Shared Waiting 39
CLogControlLock Exclusive Waiting 52
WALInsertLock Exclusive Waiting 73
CLogControlLock Shared Acquired 91
ProcArrayLock Exclusive Acquired 96
XidGenLock Exclusive Acquired 96
ProcArrayLock Exclusive Waiting 121
CLogControlLock Exclusive Acquired 199
WALInsertLock Exclusive Acquired 310
FirstBufMappingLock Shared Acquired 408
FirstLockMgrLock Shared Acquired 618
ProcArrayLock Shared Acquired 746
SInvalReadLock Shared Acquired 1542
Lock Id Mode State Combined Time
(ns)
WALInsertLock Acquired
118673
CLogControlLock Acquired
172130
FirstBufMappingLock Acquired
177196
WALWriteLock Exclusive Acquired
208403
XidGenLock Exclusive Waiting
325989
FirstLockMgrLock Acquired
2667351
ProcArrayLock Acquired
8179335
XidGenLock Exclusive Acquired
8896177
CLogControlLock Shared Waiting
9680401
CLogControlLock Exclusive Waiting
19105179
CLogControlLock Exclusive Acquired
27484249
SInvalReadLock Acquired
43026960
FirstBufMappingLock Exclusive Acquired
45232906
ProcArrayLock Shared Waiting
46741660
WALInsertLock Exclusive Waiting
50912148
FirstLockMgrLock Exclusive Acquired
58789829
WALInsertLock Exclusive Acquired
86653791
ProcArrayLock Exclusive Waiting
213980787
ProcArrayLock Exclusive Acquired
270028367
SInvalReadLock Exclusive Acquired
303044735
SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl
usr sys wt idl sze
0 1 0 147238 159453 8806 370676 89236 71258 98435 0 380008
47 17 0 35 64
0 6 0 132463 143446 7975 331685 80847 64746 86578 0 329315
44 16 0 41 64
0 16 0 146655 158621 8987 366866 90756 69953 93786 0 349346
49 17 0 34 64
0 18 0 151326 163492 8992 377634 92860 72406 98968 4 365121
49 17 0 33 64
0 2 0 142914 154169 8243 352104 81385 69598 91260 0 340887
42 16 0 42 64
0 16 0 156755 168962 9080 386475 93072 74775 101465 0 379250
47 18 0 36 64
0 1 0 152807 165134 8880 379521 90671 75073 99692 0 380412
48 18 0 35 64
0 1 0 134778 146041 8122 339137 79888 66633 89220 0 342600
43 16 0 41 64
0 16 0 153014 164789 8834 376117 93000 72743 97644 0 371792
48 18 0 35 64
Not sure what SInvalReadLock does.. need to read up on that..
-Jignesh
1200: 160: Medium Throughput: 130487.000 Avg Medium Resp: 0.011
1260: 168: Medium Throughput: 123368.000 Avg Medium Resp: 0.013
1320: 176: Medium Throughput: 134649.000 Avg Medium Resp: 0.012
1380: 184: Medium Throughput: 136272.000 Avg Medium Resp: 0.013
kthr memory page disk faults
cpu
r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs
us sy id
3 0 0 44710008 39603320 0 135 0 0 0 0 0 0 0 0 0 110564 145678
211567 33 10 57
5 0 0 44663368 39595008 0 6 0 0 0 0 0 0 0 0 0 108891 143083
208389 33 10 58
3 0 0 44753496 39593824 0 132 0 0 0 0 0 0 0 0 0 109922 126865
209869 33 9 57
4 0 0 44788368 39588528 0 7 0 0 0 0 0 0 0 0 0 108680 129073
208068 33 10 57
2 0 0 44767920 39570592 0 147 0 0 0 0 0 0 0 0 0 106671 142403
204724 33 10 58
4 0 0 44762656 39563256 0 11 0 0 0 0 0 0 0 0 0 106185 130328
204551 34 10 57
2 0 0 44674584 39560912 0 148 0 0 0 0 0 0 0 0 0 104757 139147
201448 32 10 58
1 0 0 44619824 39551024 0 9 0 0 0 0 0 0 0 0 0 103653 142125
199896 32 10 58
2 0 0 44622480 39552432 0 141 0 0 0 0 0 0 0 0 0 101373 134547
195553 32 9 58
1 0 0 44739936 39552312 0 11 0 0 0 0 0 0 0 0 0 102932 121742
198205 33 9 58
And lock stats are as follows at about 280 users sampling for a
single backend process:
# ./84_lwlock.d 29405
Lock Id Mode State Count
WALWriteLock Exclusive Acquired 1
XidGenLock Exclusive Waiting 1
CLogControlLock Shared Waiting 3
ProcArrayLock Shared Waiting 7
CLogControlLock Exclusive Waiting 9
WALInsertLock Exclusive Waiting 45
CLogControlLock Shared Acquired 52
ProcArrayLock Exclusive Waiting 61
XidGenLock Exclusive Acquired 96
ProcArrayLock Exclusive Acquired 97
CLogControlLock Exclusive Acquired 152
WALInsertLock Exclusive Acquired 302
ProcArrayLock Shared Acquired 729
FirstLockMgrLock Shared Acquired 812
FirstBufMappingLock Shared Acquired 857
SInvalReadLock Shared Acquired 1551
Lock Id Mode State Combined Time
(ns)
WALInsertLock Acquired
89909
XidGenLock Exclusive Waiting
101488
WALWriteLock Exclusive Acquired
140563
CLogControlLock Shared Waiting
354756
FirstBufMappingLock Acquired
471438
FirstLockMgrLock Acquired
2907141
XidGenLock Exclusive Acquired
7450934
CLogControlLock Exclusive Waiting
11094716
ProcArrayLock Acquired
15495229
WALInsertLock Exclusive Waiting
20801169
CLogControlLock Exclusive Acquired
21339264
SInvalReadLock Acquired
24309991
FirstLockMgrLock Exclusive Acquired
39904071
FirstBufMappingLock Exclusive Acquired
40826435
ProcArrayLock Shared Waiting
86352947
WALInsertLock Exclusive Acquired
89336432
SInvalReadLock Exclusive Acquired
252574515
ProcArrayLock Exclusive Acquired
315064347
ProcArrayLock Exclusive Waiting
847806215
mpstat outputs is too much so I am doing aggegation by procesor set
which is all 64 cpus
-bash-3.2$ mpstat -a 10
SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl
usr sys wt idl sze
0 370 0 118649 127575 7595 244456 43931 62166 8700 0 158929
38 11 0 50 64
0 167 0 119668 128704 7644 246389 43287 62357 8816 0 161006
38 11 0 51 64
0 27 0 109461 117433 6997 224514 38562 56446 8171 0 148322
34 10 0 56 64
0 2 0 122368 131549 7871 250237 39620 61478 9082 0 165995
36 11 0 52 64
0 0 0 122025 131380 7973 249429 37292 59863 8922 0 166319
35 11 0 54 64
(quick overview of columns )
SET Processor set
minf minor faults
mjf major faults
xcal inter-processor cross-calls
intr interrupts
ithr interrupts as threads (not counting clock
interrupt)
csw context switches
icsw involuntary context switches
migr thread migrations (to another processor)
smtx spins on mutexes (lock not acquired on first
try)
srw spins on readers/writer locks (lock not
acquired on first try)
syscl system calls
usr percent user time
sys percent system time
wt the I/O wait time is no longer calculated as a
percentage of CPU time, and this statistic
will always return zero.
idl percent idle time
sze number of processors in the requested proces-
sor set
-Jignesh
--
Jignesh Shah http://blogs.sun.com/jkshah
The New Sun Microsystems,Inc http://sun.com/postgresql
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance