Re: mm: lru_add_drain_all hangs

Vlastimil Babka <vbabka@xxxxxxx> · Sat, 28 Mar 2015 13:41:02 +0100

On 27.3.2015 22:36, Sasha Levin wrote:
> On 03/27/2015 06:07 AM, Vlastimil Babka wrote:
>>> [ 3614.918852] trinity-c7      D ffff8802f4487b58 26976 16252   9410 0x10000000
>>>> [ 3614.919580]  ffff8802f4487b58 ffff8802f6b98ca8 0000000000000000 0000000000000000
>>>> [ 3614.920435]  ffff88017d3e0558 ffff88017d3e0530 ffff8802f6b98008 ffff88016bad0000
>>>> [ 3614.921219]  ffff8802f6b98000 ffff8802f4487b38 ffff8802f4480000 ffffed005e890002
>>>> [ 3614.922069] Call Trace:
>>>> [ 3614.922346] schedule (./arch/x86/include/asm/bitops.h:311 (discriminator 1) kernel/sched/core.c:2827 (discriminator 1))
>>>> [ 3614.923023] schedule_preempt_disabled (kernel/sched/core.c:2859)
>>>> [ 3614.923707] mutex_lock_nested (kernel/locking/mutex.c:585 kernel/locking/mutex.c:623)
>>>> [ 3614.924486] ? lru_add_drain_all (mm/swap.c:867)
>>>> [ 3614.925211] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2580 kernel/locking/lockdep.c:2622)
>>>> [ 3614.925970] ? lru_add_drain_all (mm/swap.c:867)
>>>> [ 3614.926692] ? mutex_trylock (kernel/locking/mutex.c:621)
>>>> [ 3614.927464] ? mpol_new (mm/mempolicy.c:285)
>>>> [ 3614.928044] lru_add_drain_all (mm/swap.c:867)
>>>> [ 3614.928608] migrate_prep (mm/migrate.c:64)
>>>> [ 3614.929092] SYSC_mbind (mm/mempolicy.c:1188 mm/mempolicy.c:1319)
>>>> [ 3614.929619] ? rcu_eqs_exit_common (kernel/rcu/tree.c:735 (discriminator 8))
>>>> [ 3614.930318] ? __mpol_equal (mm/mempolicy.c:1304)
>>>> [ 3614.930877] ? trace_hardirqs_on (kernel/locking/lockdep.c:2630)
>>>> [ 3614.931485] ? syscall_trace_enter_phase2 (arch/x86/kernel/ptrace.c:1592)
>>>> [ 3614.932184] SyS_mbind (mm/mempolicy.c:1301)
>> That looks like trinity-c7 is waiting ot in too, but later on (after some more
>> listings like this for trinity-c7, probably threads?) we have:
>>
> 
> It keeps changing constantly, even in this trace the process is blocking on the mutex

I think it's multiple threads of process with same name trinity-c7, and the
thread 16935 of trinity-c7 does have the mutex locked and is waiting on
something else.

> rather than doing something useful, and in the next trace it's a different process.

And the next trace is from the same run, just later, i.e. it doesn't hang
completely, but makes too slow progress so that 20 minutes hang timer catches
this? I'm not sure here.

If it's too slow, I can imagine it could be simply optimized - if one thread
manages to lock the mutex, it can tell all threads waiting *at that moment* that
they can just return when the first thread is done - it has done the necessary
work for all of them already. But I wonder if this contention happens in
practice. And that certainly doesn't explain any regression that apparently occured.

> 
> Thanks,
> Sasha
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>