Hi all, While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew: [ 729.682257] INFO: task trinity-c158:13508 blocked for more than 120 seconds. [ 729.683191] Not tainted 3.16.0-rc5-next-20140718-sasha-00052-g4d34feb #902 [ 729.683843] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 729.684633] trinity-c158 D ffff880d32169000 12520 13508 8729 0x00000000 [ 729.685323] ffff880de6d3fe38 0000000000000002 0000000000000006 ffff880c6c33b000 [ 729.686039] 000000785eb36cd1 ffff880de6d3c010 ffff880de6d3c000 ffff880e2b270000 [ 729.686761] ffff880c6c33b000 0000000000000000 ffffffffb070b908 0000000026e426e2 [ 729.687483] Call Trace: [ 729.687736] schedule_preempt_disabled (kernel/sched/core.c:2874) [ 729.688544] mutex_lock_nested (kernel/locking/mutex.c:532 kernel/locking/mutex.c:584) [ 729.689127] ? lru_add_drain_all (mm/swap.c:867) [ 729.689666] ? lru_add_drain_all (mm/swap.c:867) [ 729.690334] lru_add_drain_all (mm/swap.c:867) [ 729.690946] SyS_mlockall (./arch/x86/include/asm/current.h:14 include/linux/sched.h:2978 mm/mlock.c:813 mm/mlock.c:798) [ 729.691437] tracesys (arch/x86/kernel/entry_64.S:541) [ 729.691883] 1 lock held by trinity-c158/13508: [ 729.692333] #0: (lock#3){+.+...}, at: lru_add_drain_all (mm/swap.c:867) There are quite a few tasks with the same stacktrace. Since the lock we're blocking on is static local to the function it's easy to figure out which task actually has it: [ 739.587839] trinity-c135 D ffff880e1318c000 13096 21051 8729 0x00000000 [ 739.589080] ffff880bf659fcc0 0000000000000002 ffff880c74223cf0 ffff880c74223000 [ 739.590544] 000000781c3378ed ffff880bf659c010 ffff880bf659c000 ffff880e2b153000 [ 739.591815] ffff880c74223000 0000000000000000 7fffffffffffffff ffff880bf659fe80 [ 739.593165] Call Trace: [ 739.593588] schedule (kernel/sched/core.c:2847) [ 739.594396] schedule_timeout (kernel/time/timer.c:1476) [ 739.595354] ? mark_lock (kernel/locking/lockdep.c:2894) [ 739.596229] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 739.597297] ? get_parent_ip (kernel/sched/core.c:2561) [ 739.598185] wait_for_completion (include/linux/spinlock.h:328 kernel/sched/completion.c:76 kernel/sched/completion.c:93 kernel/sched/completion.c:101 kernel/sched/completion.c:122) [ 739.599191] ? wake_up_state (kernel/sched/core.c:2942) [ 739.600246] flush_work (kernel/workqueue.c:503 kernel/workqueue.c:2762) [ 739.601171] ? flush_work (kernel/workqueue.c:2733 kernel/workqueue.c:2760) [ 739.602084] ? destroy_worker (kernel/workqueue.c:2348) [ 739.603035] ? wait_for_completion (kernel/sched/completion.c:64 kernel/sched/completion.c:93 kernel/sched/completion.c:101 kernel/sched/completion.c:122) [ 739.604043] lru_add_drain_all (include/linux/cpumask.h:180 mm/swap.c:883) [ 739.605076] SyS_mlockall (./arch/x86/include/asm/current.h:14 include/linux/sched.h:2978 mm/mlock.c:813 mm/mlock.c:798) [ 739.605943] tracesys (arch/x86/kernel/entry_64.S:541) Now, you'd expect to see lru_add_drain_per_cpu in one of the tasks, but that's not the case. Full log attched. Thanks, Sasha
Attachment:
log.txt.xz
Description: application/xz