swap storms on low memory i386, tasks blocked on i386 and amd64 for kernel > 2.6.36-git6

Arthur Marsh <arthur.marsh@xxxxxxxxxxxxxxxx> · Sun, 31 Oct 2010 22:06:17 +1030

Hi, first post to this list.

I have a PII-266 with 384 MiB RAM (maximum capacity for the machine) and
an AMD64 dual core with 4 GiB RAM, both running Debian unstable plus
some packages from experimental and custom kernels.

A typical load for the PII-266 is KDE 3.5.10 with konversation, icedove,
iceweasel, xmms, lynx, hp-systray, top and aptitude-curses.

This works with stock Debian kernels and custom kernels up to and
including 2.6.36-git6. Under heavy load, free RAM will hover around 5
MiB but audio will still play with a very occasional skip and all
applications are responsive.

With the newer kernels, e.g. 2.6.36-git9,10,11 (all with the deadline
scheduler) get into a swap storm with over 32 MiB RAM free and kswapd0
taking 10 percent or more of CPU time. Kernel 2.6.36-git15 with the cfq
scheduler managed to keep smaller applications like xmms and shells
running but also had excessive free RAM and kswapd0 taking more than 10
percent of CPU time.

On the AMD64 dual core when compiling kernels with CONCURRENCY_LEVEL=4,
I would sometimes get the build process pausing with errors like:

[ 2880.492025] INFO: task sh:10071 blocked for more than 120 seconds.
[ 2880.493165] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2880.494299] sh            D ffff8800cfc53780     0 10071   9092
0x00000000
[ 2880.495433]  ffff880128b214a0 0000000000000086 ffff880128851b00
ffff88012b67cba0
[ 2880.496612]  0000000000013780 0000000000013780 ffff88011969dfd8
ffff88011969c000
[ 2880.497765]  ffff880128b21798 ffff880128b217a0 ffff880128b214a0
ffff88011969dfd8
[ 2880.498925] Call Trace:
[ 2880.500096]  [<ffffffff810086fc>] ? __switch_to+0x198/0x284
[ 2880.501262]  [<ffffffff8132d36b>] ? schedule_timeout+0x2d/0xd7
[ 2880.502418]  [<ffffffff8103999f>] ? need_resched+0x1a/0x23
[ 2880.503564]  [<ffffffff8132d13f>] ? schedule+0x5b2/0x5c9
[ 2880.504739]  [<ffffffff8103e36d>] ? get_parent_ip+0x9/0x1b
[ 2880.505883]  [<ffffffff8132c9c6>] ? wait_for_common+0x9d/0x10c
[ 2880.507024]  [<ffffffff81043d10>] ? default_wake_function+0x0/0xf
[ 2880.508189]  [<ffffffff81089999>] ? stop_one_cpu+0x57/0x6e
[ 2880.509288]  [<ffffffff81043b23>] ? migration_cpu_stop+0x0/0x2a
[ 2880.510354]  [<ffffffff81331951>] ? sub_preempt_count+0x83/0x94
[ 2880.511426]  [<ffffffff8103f880>] ? sched_exec+0xbe/0xd6
[ 2880.512523]  [<ffffffff810fea4b>] ? do_execve+0xd1/0x28e
[ 2880.513582]  [<ffffffff81010786>] ? sys_execve+0x3f/0x54
[ 2880.514616]  [<ffffffff81009fdc>] ? stub_execve+0x6c/0xc0
amarsh04@am64:~$ uname -a
Linux am64 2.6.36-git16 #1 SMP PREEMPT Sun Oct 31 15:41:03 CST 2010
x86_64 GNU/Linux

The process was unblocked by logging another session into the AMD64 machine.

The "task foo blocked for more than 120 seconds" has also occurred on
the PII-266 uniprocessor machine with some of the 2.6.36-git9 or later
kernels. Previously I had not had such a problem on the PII-266 for
about 6 months.

I can't begin to figure out where this problem was introduced.
Git-bisection doesn't always work as it may take a while for the
symptoms of a swap storm to appear.

Is there any straightforward way to gather more information before
reporting this kind of issue upstream?

Has anyone experienced similar problems or read of similar reports?

Arthur.

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ