Re: Linux memory zone reclaim

Scott Marlowe <scott.marlowe@xxxxxxxxx> · Mon, 30 Jul 2012 11:09:35 -0600

On Mon, Jul 30, 2012 at 10:43 AM, Kevin Grittner
<Kevin.Grittner@xxxxxxxxxxxx> wrote:
> node distances:
> node   0   1   2   3
>   0:  10  11  11  11
>   1:  11  10  11  11
>   2:  11  11  10  11
>   3:  11  11  11  10
>
> When considering a hardware purchase, it might be wise to pay close
> attention to how "far" a core may need to go to get to the most
> "distant" RAM.

I think the zone_reclaim gets turned on with a high ratio.  If the
inter node costs were the same, and the intranode costs dropped in
half, zone reclaim would likely get turned on at boot time.

I had something similar in a 48 core system but if I recall correctly
the matrix was 8x8 and the cost differential was much higher.

The symptoms I saw was that a very hard working db, on a 128G machine
with about 95G as OS / kernel cache, would slow to a crawl with kswapd
working very hard (I think it was kswapd) after a period of 1 to 3
weeks.  Note that actual swap in and out wasn't all that great by
vmstat.  The same performance hit happened on a similar machine used
as a file server after a similar period of warm up.

The real danger here is that the misbehavior can take a long time to
show up, and from what I read at the time, the performance gain for
any zone reclaim = 1 was minimal for a file or db server, and more in
line for a large virtual machine farm, with a lot of processes chopped
into sections small enough to fit in one node's memory and not need a
lot of access from another node.  Anything that relies on the OS to
cache is likely not served by zone reclaim = 1.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance