Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning, regression?

Bruno PrÃmont <bonbons@xxxxxxxxxxxxxxxxx> · Wed, 27 Apr 2011 08:15:01 +0200

On Wed, 27 Apr 2011 00:28:37 +0200 (CEST) Thomas Gleixner wrote:
> On Tue, 26 Apr 2011, Linus Torvalds wrote:
> > On Tue, Apr 26, 2011 at 10:09 AM, Bruno PrÃmont wrote:
> > >
> > > Just in case, /proc/$(pidof rcu_kthread)/status shows ~20k voluntary
> > > context switches and exactly one non-voluntary one.
> > >
> > > In addition when rcu_kthread has stopped doing its work
> > > `swapoff $(swapdevice)` seems to block forever (at least normal shutdown
> > > blocks on disabling swap device).
> > > If I get to do it when I get back home I will manually try to swapoff
> > > and take process traces with sysrq-t.
> > 
> > That "exactly one non-voluntary one" sounds like the smoking gun.
> > 
> > Normally SCHED_FIFO runs until it voluntarily gives up the CPU. That's
> > kind of the point of SCHED_FIFO. Involuntary context switches happen
> > when some higher-priority SCHED_FIFO process becomes runnable (irq
> > handlers? You _do_ have CONFIG_IRQ_FORCED_THREADING=y in your config
> > too), and maybe there is a bug in the runqueue handling for that case.
> 
> The forced irq threading is only effective when you add the command
> line parameter "threadirqs". I don't see any irq threads in the ps
> outputs, so that's not the problem.
> 
> Though the whole ps output is weird. There is only one thread/process
> which accumulated CPU time
> 
> collectd  1605  0.6  0.7  49924  3748 ?        SNLsl 22:14   0:14

Whole system does not have much uptime so it's quite expected that CPU
time remains low. collectd is the only daemon that has more work to do
(scan many files every 10s)
On the ps output with stopped build processes there should be some more
with accumulated CPU time... though looking at it only top and python
have accumulated anything.

Next time I can scan /proc/${PID}/ for more precise CPU times to see
how zero they are.

> All others show 0:00 CPU time - not only kthread_rcu.
> 
> Bruno, are you running on real hardware or in a virtual machine?

It's real hardware (nforce420 chipset - aka first nforce generation -,
AMD Athlon 1800 CPU, 512MB of RAM out of which 32MB taken by
IGP, so something like 7-10 or so years old)

> Can you please enable CONFIG_SCHED_DEBUG and provide the output of
> /proc/sched_stat when the problem surfaces and a minute after the
> first snapshot?
> 
> Also please apply the patch below and check, whether the printk shows
> up in your dmesg.

Will include in my testing when back home this evening. (Will have to
offload kernel compilations to a quicker box otherwise my evening will
be much too short...)

Bruno

> Thanks,
> 
> 	tglx
> 
> ---
>  kernel/sched_rt.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-2.6-tip/kernel/sched_rt.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/sched_rt.c
> +++ linux-2.6-tip/kernel/sched_rt.c
> @@ -609,6 +609,7 @@ static int sched_rt_runtime_exceeded(str
>  
>  	if (rt_rq->rt_time > runtime) {
>  		rt_rq->rt_throttled = 1;
> +		printk_once(KERN_WARNING "sched: RT throttling activated\n");
>  		if (rt_rq_throttled(rt_rq)) {
>  			sched_rt_rq_dequeue(rt_rq);
>  			return 1;
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html