Re: Debugging shared memory issues on CentOS

Tom Lane <tgl@xxxxxxxxxxxxx> · Tue, 10 Dec 2013 23:54:36 -0500

Mack Talcott <mack.talcott@xxxxxxxxx> writes:
> I am trying to debug some shared memory issues with Postgres 9.3.1 and
> CentOS release 6.3 (Final).  I have a database machine that probably has
> some misconfigured shared memory settings.  It's getting into 2+ GB of
> swap.  Restarting postgres frees all of the memory, but after a few hours
> of normal usage it will go back into swap.

Are you sure the kernel isn't just swapping out some idle processes
because it feels like it?  These numbers don't exactly look like a
machine under stress:

> top - 09:38:16 up 1 day, 21:21,  3 users,  load average: 0.40, 0.54, 0.45
> Tasks: 253 total,   2 running, 251 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.7%us,  0.2%sy,  0.0%ni, 97.8%id,  1.2%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Mem:   6998260k total,  6849048k used,   149212k free,      248k buffers
> Swap: 440478516k total,  1981912k used, 438496604k free,  1541356k cached

In particular, you've got 1.5 gig of filesystem cache, so you're hardly
out of memory.  I don't know where the other 5.5 gig of RAM went, but
it doesn't look like postgres is eating it; what else is running on
this box?

These lines look absolutely normal, assuming that you've configured
shared_buffers somewhere in the neighborhood of 1GB:

>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  3534 postgres  20   0 2330m 1.4g 1.1g S  0.0 20.4   1:06.99 postgres:
> deploy mtalcott 10.222.154.172(53495) idle
>  9143 postgres  20   0 2221m 1.1g 983m S  0.0 16.9   0:14.75 postgres:
> deploy mtalcott 10.222.154.167(35811) idle
>  6026 postgres  20   0 2341m 1.1g 864m S  0.0 16.4   0:46.56 postgres:
> deploy mtalcott 10.222.154.167(37110) idle
> 18538 postgres  20   0 2327m 1.1g 865m S  0.0 16.1   2:06.59 postgres:
> deploy mtalcott 10.222.154.172(47796) idle
>  1575 postgres  20   0 2358m 1.1g 858m S  0.0 15.9   1:41.76 postgres:
> deploy mtalcott 10.222.154.172(52560) idle

The key thing to realize about that is that the SHR column is *shared*
memory, ie all these processes are referencing the same chunk of about 1GB
worth of memory.  The process-specific memory is RES minus SHR, and none
of those processes seem tremendously out of line on that measure.  (Note:
the fact that the SHR values aren't all exactly the same is because top
doesn't count a shared page until the process has physically touched that
page.  Even the guy with 1.1g of SHR might not have touched all of the
shared storage yet.)

I'm not sure you have a problem here.  If you do, these figures aren't
showing it.  Having some stuff shoved out to swap is not a problem unless
you have a problem with the swap I/O rate.  You might try watching "vmstat
1" for awhile to see if the si/so columns show significant activity.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance