Proposal of tunable fix for scalability of 8.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello All,

As you know that one of the thing that constantly that I have been using benchmark kits to see how we can scale PostgreSQL on the UltraSPARC T2 based 1 socket (64 threads) and 2 socket (128 threads) servers that Sun sells.

During last PgCon 2008 http://www.pgcon.org/2008/schedule/events/72.en.html you might remember that I mentioned that ProcArrayLock is pretty hot when you have many users.

Rerunning similar tests on a 64-thread UltraSPARC T2plus based server config, I found that even with 8.4snap that I took I was still having similar problems (IO is not a problem... all in RAM .. no disks):
Time:Users:Type:TPM: Response Time
60: 100: Medium Throughput: 10552.000 Avg Medium Resp: 0.006
120: 200: Medium Throughput: 22897.000 Avg Medium Resp: 0.006
180: 300: Medium Throughput: 33099.000 Avg Medium Resp: 0.009
240: 400: Medium Throughput: 44692.000 Avg Medium Resp: 0.007
300: 500: Medium Throughput: 56455.000 Avg Medium Resp: 0.007
360: 600: Medium Throughput: 67220.000 Avg Medium Resp: 0.008
420: 700: Medium Throughput: 77592.000 Avg Medium Resp: 0.009
480: 800: Medium Throughput: 87277.000 Avg Medium Resp: 0.011
540: 900: Medium Throughput: 98029.000 Avg Medium Resp: 0.012
600: 1000: Medium Throughput: 102547.000 Avg Medium Resp: 0.023
660: 1100: Medium Throughput: 100503.000 Avg Medium Resp: 0.044
720: 1200: Medium Throughput: 99506.000 Avg Medium Resp: 0.065
780: 1300: Medium Throughput: 95474.000 Avg Medium Resp: 0.089
840: 1400: Medium Throughput: 86254.000 Avg Medium Resp: 0.130
900: 1500: Medium Throughput: 91947.000 Avg Medium Resp: 0.139
960: 1600: Medium Throughput: 94838.000 Avg Medium Resp: 0.147
1020: 1700: Medium Throughput: 92446.000 Avg Medium Resp: 0.173
1080: 1800: Medium Throughput: 91032.000 Avg Medium Resp: 0.194
1140: 1900: Medium Throughput: 88236.000 Avg Medium Resp: 0.221
runDynamic: uCount =  2000delta = 1900
runDynamic: ALL Threads Have Been created
1200: 2000: Medium Throughput: -1352555.000 Avg Medium Resp: 0.071
1260: 2000: Medium Throughput: 88872.000 Avg Medium Resp: 0.238
1320: 2000: Medium Throughput: 88484.000 Avg Medium Resp: 0.248
1380: 2000: Medium Throughput: 90777.000 Avg Medium Resp: 0.231
1440: 2000: Medium Throughput: 90769.000 Avg Medium Resp: 0.229

You will notice that throughput drops around 1000 users.. Nothing new you have already heard me mention that zillion times..

Now while working on this today I was going through LWLockRelease like I have probably done quite a few times before to see what can be done.. The quick synopsis is that LWLockRelease releases the lock and wakes up the next waiter to take over and if the next waiter is waiting for exclusive then it only wakes that waiter up and if next waiter is waiting on shared then it goes through all shared waiters following and wakes them all up.

Earlier last year I had tried various ways of doing intelligent waking up (finding all shared together and waking them up, coming up with a different lock type and waking multiple of them up simultaneously but ended up defining a new lock mode and of course none of them were stellar enough to make an impack..

Today I tried something else.. Forget the distinction of exclusive and shared and just wake them all up so I changed the code from
                           /*
* Remove the to-be-awakened PGPROCs from the queue. If the front * waiter wants exclusive lock, awaken him only. Otherwise awaken
                           * as many waiters as want shared access.
                           */
                       proc = head;
                       if (!proc->lwExclusive)
                       {
                              while (proc->lwWaitLink != NULL &&
                                          !proc->lwWaitLink->lwExclusive)
                                  proc = proc->lwWaitLink;
                       }
                       /* proc is now the last PGPROC to be released */
                       lock->head = proc->lwWaitLink;
                       proc->lwWaitLink = NULL;
/* prevent additional wakeups until retryer gets to run */
                       lock->releaseOK = false;


to basically wake them all up:
           /*
* Remove the to-be-awakened PGPROCs from the queue. If the front * waiter wants exclusive lock, awaken him only. Otherwise awaken
            * as many waiters as want shared access.
            */
                       proc = head;
           //if (!proc->lwExclusive)
           if (1)
           {
                            while (proc->lwWaitLink != NULL &&
                                         1)
// !proc->lwWaitLink->lwExclusive)
                                       proc = proc->lwWaitLink;
           }
                       /* proc is now the last PGPROC to be released */
           lock->head = proc->lwWaitLink;
                       proc->lwWaitLink = NULL;
/* prevent additional wakeups until retryer gets to run */
                       lock->releaseOK = false;


Which basically wakes them all up and let them find (technically causing thundering herds what the original logic was trying to avoid) I reran the test and saw the results:

Time:Users:Type:TPM: Response Time
60: 100: Medium Throughput: 10457.000 Avg Medium Resp: 0.006
120: 200: Medium Throughput: 22809.000 Avg Medium Resp: 0.006
180: 300: Medium Throughput: 33665.000 Avg Medium Resp: 0.008
240: 400: Medium Throughput: 45042.000 Avg Medium Resp: 0.006
300: 500: Medium Throughput: 56655.000 Avg Medium Resp: 0.007
360: 600: Medium Throughput: 67170.000 Avg Medium Resp: 0.007
420: 700: Medium Throughput: 78343.000 Avg Medium Resp: 0.008
480: 800: Medium Throughput: 87979.000 Avg Medium Resp: 0.008
540: 900: Medium Throughput: 100369.000 Avg Medium Resp: 0.008
600: 1000: Medium Throughput: 110697.000 Avg Medium Resp: 0.009
660: 1100: Medium Throughput: 121255.000 Avg Medium Resp: 0.010
720: 1200: Medium Throughput: 132915.000 Avg Medium Resp: 0.010
780: 1300: Medium Throughput: 141505.000 Avg Medium Resp: 0.012
840: 1400: Medium Throughput: 147084.000 Avg Medium Resp: 0.021
light: customer: No result set for custid 0
900: 1500: Medium Throughput: 157906.000 Avg Medium Resp: 0.018
light: customer: No result set for custid 0
960: 1600: Medium Throughput: 160289.000 Avg Medium Resp: 0.026
1020: 1700: Medium Throughput: 152191.000 Avg Medium Resp: 0.053
1080: 1800: Medium Throughput: 157949.000 Avg Medium Resp: 0.054
1140: 1900: Medium Throughput: 161923.000 Avg Medium Resp: 0.063
runDynamic: uCount =  2000delta = 1900
runDynamic: ALL Threads Have Been created
1200: 2000: Medium Throughput: -1781969.000 Avg Medium Resp: 0.019
light: customer: No result set for custid 0
1260: 2000: Medium Throughput: 140741.000 Avg Medium Resp: 0.115
light: customer: No result set for custid 0
1320: 2000: Medium Throughput: 165379.000 Avg Medium Resp: 0.070
1380: 2000: Medium Throughput: 166585.000 Avg Medium Resp: 0.070
1440: 2000: Medium Throughput: 169163.000 Avg Medium Resp: 0.063
1500: 2000: Medium Throughput: 157508.000 Avg Medium Resp: 0.086
light: customer: No result set for custid 0
1560: 2000: Medium Throughput: 170112.000 Avg Medium Resp: 0.063

An improvement of 1.89X in throughput and still not drastically dropping which means now I can go forward still stressing up PostgreSQL 8.4 to the limits of the box.

My proposal is if we build a quick tunable for 8.4 wake-up-all-waiters=on (or something to that effect) in postgresql.conf before the beta then people can try the option and report back to see if that helps improve performance on various other benchmarks that people are running and collect feedback. This way it will be not intrusive so late in the game and also put an important scaling fix back in... Of course as usual this is open for debate.. I know avoiding thundering herd was the goal here.. but waking up 1 exclusive waiter who may not be even on CPU is pretty expensive from what I have seen till date.

What do you all think ?

Regards,
Jignesh


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux