On 27.3.2015 22:36, Sasha Levin wrote: > On 03/27/2015 06:07 AM, Vlastimil Babka wrote: >>> [ 3614.918852] trinity-c7 D ffff8802f4487b58 26976 16252 9410 0x10000000 >>>> [ 3614.919580] ffff8802f4487b58 ffff8802f6b98ca8 0000000000000000 0000000000000000 >>>> [ 3614.920435] ffff88017d3e0558 ffff88017d3e0530 ffff8802f6b98008 ffff88016bad0000 >>>> [ 3614.921219] ffff8802f6b98000 ffff8802f4487b38 ffff8802f4480000 ffffed005e890002 >>>> [ 3614.922069] Call Trace: >>>> [ 3614.922346] schedule (./arch/x86/include/asm/bitops.h:311 (discriminator 1) kernel/sched/core.c:2827 (discriminator 1)) >>>> [ 3614.923023] schedule_preempt_disabled (kernel/sched/core.c:2859) >>>> [ 3614.923707] mutex_lock_nested (kernel/locking/mutex.c:585 kernel/locking/mutex.c:623) >>>> [ 3614.924486] ? lru_add_drain_all (mm/swap.c:867) >>>> [ 3614.925211] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2580 kernel/locking/lockdep.c:2622) >>>> [ 3614.925970] ? lru_add_drain_all (mm/swap.c:867) >>>> [ 3614.926692] ? mutex_trylock (kernel/locking/mutex.c:621) >>>> [ 3614.927464] ? mpol_new (mm/mempolicy.c:285) >>>> [ 3614.928044] lru_add_drain_all (mm/swap.c:867) >>>> [ 3614.928608] migrate_prep (mm/migrate.c:64) >>>> [ 3614.929092] SYSC_mbind (mm/mempolicy.c:1188 mm/mempolicy.c:1319) >>>> [ 3614.929619] ? rcu_eqs_exit_common (kernel/rcu/tree.c:735 (discriminator 8)) >>>> [ 3614.930318] ? __mpol_equal (mm/mempolicy.c:1304) >>>> [ 3614.930877] ? trace_hardirqs_on (kernel/locking/lockdep.c:2630) >>>> [ 3614.931485] ? syscall_trace_enter_phase2 (arch/x86/kernel/ptrace.c:1592) >>>> [ 3614.932184] SyS_mbind (mm/mempolicy.c:1301) >> That looks like trinity-c7 is waiting ot in too, but later on (after some more >> listings like this for trinity-c7, probably threads?) we have: >> > > It keeps changing constantly, even in this trace the process is blocking on the mutex I think it's multiple threads of process with same name trinity-c7, and the thread 16935 of trinity-c7 does have the mutex locked and is waiting on something else. > rather than doing something useful, and in the next trace it's a different process. And the next trace is from the same run, just later, i.e. it doesn't hang completely, but makes too slow progress so that 20 minutes hang timer catches this? I'm not sure here. If it's too slow, I can imagine it could be simply optimized - if one thread manages to lock the mutex, it can tell all threads waiting *at that moment* that they can just return when the first thread is done - it has done the necessary work for all of them already. But I wonder if this contention happens in practice. And that certainly doesn't explain any regression that apparently occured. > > Thanks, > Sasha > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>