Since I had no luck getting any response on lkml, any takers here on linux-mm ?
---------- Forwarded message ---------- Date: Tue, 10 Aug 2010 08:05:51 +0200 (CEST) From: Mikael Abrahamsson <swmike@xxxxxxxxx> To: linux-kernel@xxxxxxxxxxxxxxx Subject: 2.6.32 swapper allocation failure with plenty of memory available Hi.Yesterday my Ubuntu 10.04 machine with their 2.6.32 (amd64) kernel, under a lot of disk IO and network stress stopped responding. I thought it had frozen completely, but ~2 hours later it came back to life.
When I logged in I saw a lot of "swapper allocation failure" and r8169 timeouts in dmesg (first time I've seen this cause network instability like this, but it's also the first motherboard I've tested with that has a r8169 NIC).
I've had this problem before with older kernels on other hardware <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/296275>, and it seems related to having a lot of TCP sessions up moving data, in conjunction with pretty agressive TCP tuning for long bandwidth delay product (4-8 megs of tcp memory settings with sysctl).
The machine has 8 gigs of ram (core i5 + P7H57D-V EVO motherboard) and was running programs which was using ~2 gigs of memory, so most of the memory was used for buffers and disk cache.
Unless this has been fixed since 2.6.32, I suspect it's still a problem even in newer kernels because the behaviour seems to have been present since at least 2.6.24. Generally, tuning down the TCP wmem and rmem etc to ~1 megabyte makes the problem go away.
Please see attached dmesg file for more information. -- Mikael Abrahamsson email: swmike@xxxxxxxxx
Attachment:
dmesg.100809-2.txt.gz
Description: Binary data