Re: Adding more memory = hugh cpu load

Shaun Thomas <sthomas@xxxxxxxxx> · Mon, 10 Oct 2011 09:04:55 -0500

On 10/10/2011 08:26 AM, alexandre - aldeia digital wrote:

Yesterday, a customer increased the server memory from 16GB to 48GB.

Today, the load of the server hit 40 ~ 50 points.
With 16 GB, the load not surpasses 5 ~ 8 points.

That's not entirely surprising. The problem with having lots of memory 
is... that you have lots of memory. The operating system likes to cache, 
and this includes writes. Normally this isn't a problem, but with 48GB 
of RAM, the defaults (for CentOS 5.5 in particular) are to use up to 40% 
of that to cache writes.

The settings you're looking for are in:

/proc/sys/vm/dirty_background_ratio
/proc/sys/vm/dirty_ratio

You can set these by putting lines in your /etc/sysctl.conf file:

vm.dirty_background_ratio = 1
vm.dirty_ratio = 10

And then calling:

sudo sysctl -p

The first number, the background ratio, tells the memory manager to 
start writing to disk as soon as 1% of memory is used. The second is 
like a maximum of memory that can be held for caching. If the number of 
pending writes exceeds this, the system goes into synchronous write 
mode, and blocks all other write activity until it can flush everything 
out to disk. You really, really want to avoid this.

The defaults in older Linux systems were this high mostly to optimize 
for desktop performance. For CentOS 5.5, the defaults are 10% and 40%, 
which doesn't seem like a lot. But for servers with tons of ram, 10% of 
48GB is almost 5GB. That's way bigger than all but the largest RAID or 
controller cache, which means IO waits, and thus high load. Those high 
IO waits cause a kind of cascade that slowly cause writes to back up, 
making it more likely you'll reach the hard 40% limit which causes a 
system flush, and then you're in trouble.

You can actually monitor this by checking /proc/meminfo:

grep -A1 Dirty /proc/meminfo

The 'Dirty' line tells you how much memory *could* be written to disk, 
and the 'Writeback' line tells you how much the system is trying to 
write. You want that second line to be 0 or close to it, as much as 
humanly possible. It's also good to keep Dirty low, because it can be an 
indicator that the system is about to start uncontrollably flushing if 
it gets too high.

Generally it's good practice to keep dirty_ratio lower than the size of 
your disk controller cache, but even high-end systems only give 256MB to 
1GB of controller cache. Newer kernels have introduced dirty_bytes and 
dirty_background_bytes, which lets you set a hard byte-specified limit 
instead of relying on some vague integer percentage of system memory. 
This is better for systems with vast amounts of memory that could cause 
these kinds of IO spikes. Of course, in order to use those settings, 
your client will have to either install a custom kernel, or upgrade to 
CentOS 6. Try the 1% first, and it may work out.

Some kernels have a hard 5% limit on dirty_background_ratio, but the one 
included in CentOS 5.5 does not. You can even set it to 0, but your IO 
throughput will take a nosedive, because at that point, it's always 
writing to disk without any effective caching at all. The reason we set 
dirty_ratio to 10%, is because we want to reduce the total amount of 
time a synchronous IO block lasts. You can probably take that as low as 
5%, but be careful and test to find your best equilibrium point. You 
want it at a point it rarely blocks, but if it does, it's over quickly.

There's more info here:

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

(I only went on about this because we had the same problem when we 
increased from 32GB to 72GB. It was a completely unexpected reaction, 
but a manageable one.)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@xxxxxxxxx

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance