Re: Memory Issue

Scott Marlowe <scott.marlowe@xxxxxxxxx> · Thu, 3 Nov 2011 08:30:27 -0600

On Thu, Nov 3, 2011 at 7:34 AM, Ioana Danes <ioanasoftware@xxxxxxxx> wrote:
> Hello Everyone,
>
> I have a performance test running with 1200 clients performing this transaction every second:
>
>
> begin transaction
> select nextval('sequence1');
> select nextval('sequence2');
> insert into table1;
> insert into table2;
> commit;
>
> Table1 and table2 have no foreign keys and no triggers. There are 13 indexes on table1 and 5 indexes on table2.
>
> We do use connection pooling but because the clients commit this transaction every second I basically have 1200 connections all the time.
>
> The db server is dedicated running on
>
> SUSE Linux Enterprise Server 11 (x86_64)
> VERSION = 11
> PATCHLEVEL = 1
>
> Postgres 9.0.3 (Same behaviour on Postgres 9.1.1):
>
>
> The server has 16GB of RAM and the postgres parameters are:
>
> shared_buffers = 4GB
> work_mem = 1MB
> maintenance_work_mem = 2GB
> effective_cache_size = 8GB
>
> wal_level = minimal
> wal_buffers = 1MB
>
> checkpoint_segments =16
> checkpoint_warning = 30s
> archive_mode = off
>
> autovacuum = off
>
> kernel.shmmax=5368709120 (5GB)
> kernel.shmall=5368709120 (5GB)
>
> The test performs well for about an hour with 1150 TPS and then the TPS goes down really bad and the clients timeout...
> I watched the memory usage and the slowness is caused by swapping:
>
> vmstat
>
> procs -----------memory---------- ---swap-- -----io---- -system--
>      -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs
>      us sy id wa st
> 33  1 1149892  77484    436 3289824 1012 2700  1016  5844 12285
>      35318 43 32  1  0 25
> 28  1 1150348  73996    432 3291632  908 1300   908  4684 13668
>      29100 43 31  1  0 24
>  8  1 1151704  75212    440 3292756 1048 2260  1056 10300 13844
>      18628 39 33  6  1 22
>  8  1 1152716  75364    428 3294028 1640 1932  1640  6780 15325
>      17785 38 34  5  1 22
> 11  1 1154024  94356    444 3278828 1260 2328  1276  6752 15171
>      15538 40 30  7  1 22
>  1  0 1154876  98156    480 3281456 1572 1844  1596  8260 14690
>      14451 32 32 13  2 19
>  0  0 1154892 100588    492 3281636   56  108    68   932 2744
>      2082  2  8 88  1  1
>
>
> free
> --------------------------------
>              total       used       free     shared    buffers     cached
> Mem:      16790144   16710092      80052          0       1724
>      3337172
> -/+ buffers/cache:   13371196    3418948
> Swap:      4194296    1162980    3031316
>
>
> top:
> --------------------------------
> top - 12:12:00 up  1:54,  6 users,  load average: 37.57, 41.52, 31.24
> Tasks: 1309 total,  42 running, 1267 sleeping,   0 stopped,   0
>      zombie
> Cpu(s): 29.4%us, 13.8%sy,  0.0%ni, 23.8%id, 12.8%wa,  0.0%hi,
>      5.3%si, 14.8%st
> Mem:     16396M total,    16310M used,       85M free,        2M
>      buffers
> Swap:     4095M total,     1187M used,     2908M free,     3213M
>      cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>      COMMAND
>    39 root      20   0     0    0    0 S    4  0.0   3:42.18
>      kswapd0
>  2282 postgres  20   0 57628  608  332 R    3  0.0   2:18.42
>      postmaster
>  9722 postgres  20   0 4308m 220m 215m S    2  1.3   0:03.01
>      postmaster
> 10483 postgres  20   0 4303m 164m 160m S    2  1.0   0:02.02
>      postmaster
> 10520 postgres  20   0 4303m 158m 155m R    2  1.0   0:02.24
>      postmaster
>  9005 postgres  20   0 4308m 298m 294m S    2  1.8   0:04.52
>      postmaster
>
>
> After another half an hour almost the entire swap is used and the system performs really bad 100 TPS or lower.
> It never runs out of memory though!
>
> I would like to ask for your opinion on this issue.
> My concerns are why the memory is not reused earlier and it is using the swapping when the system does only these 2 inserts.
> Is this an OS issue, postgres issue, configuration issue?
> Your advice is greatly appreciated.

You can try a few things.

1: lower your shared_buffers.  It's unlikely you really need 4G for
them.  A few hundred megs is probably plenty for the type of work
you're doing.  Let the kernel cache the data you're not hitting right
this second.

2: Set swappiness to 0.    I.e. edit /etc/sysctl.conf and add a line
like vm.swappiness = 0 then run sudo sysctl -p

3: Turn off overcommit.  Same as number 2, set vm.overcommit_memory =
2 which will turn off the ability of linux to overcommit memory and
should then turn off the OOM killer.

4: just turn off swap. With only 16Gigs this is a tad dangerous,
especially if you haven't turned off the OOM in step 3.  Memory is
cheap, throw 32G at least into the machine.  With 1200 users, you
really need plenty of memory.  To turn off swap add something like
/sbin/swapoff -a to the /etc/rc.local file (before the exit line
natch)

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general