Darrick J. Wong wrote: > On Fri, Dec 08, 2017 at 08:50:38AM +0500, mikhail wrote: > > Hi, > > > > can anybody said what here happens? > > And which info needed for fixing it? > > Thanks. > > > > [16712.376081] INFO: task tracker-store:27121 blocked for more than 120 > > seconds. > > [16712.376088] Not tainted 4.15.0-rc2-amd-vega+ #10 > > [16712.376092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [16712.376095] tracker-store D13400 27121 1843 0x00000000 > > [16712.376102] Call Trace: > > [16712.376114] ? __schedule+0x2e3/0xb90 > > [16712.376123] ? wait_for_completion+0x146/0x1e0 > > [16712.376128] schedule+0x2f/0x90 > > [16712.376132] schedule_timeout+0x236/0x540 > > [16712.376143] ? mark_held_locks+0x4e/0x80 > > [16712.376147] ? _raw_spin_unlock_irq+0x29/0x40 > > [16712.376153] ? wait_for_completion+0x146/0x1e0 > > [16712.376158] wait_for_completion+0x16e/0x1e0 > > [16712.376162] ? wake_up_q+0x70/0x70 > > [16712.376204] ? xfs_buf_read_map+0x134/0x2f0 [xfs] > > [16712.376234] xfs_buf_submit_wait+0xaf/0x520 [xfs] > > Stuck waiting for a directory block to read. Slow disk? Bad media? > Most likely cause is that I/O was getting very slow due to memory pressure. Running memory consuming processes (e.g. web browsers) and file writing processes might generate stresses like this report. I can't tell whether this report is a real deadlock/lockup or just a slowdown, for currently we don't have means for checking whether memory allocation was making progress or not. The OOM killer is not invoked for allocation requests without __GFP_FS flag. Therefore, GFP_NOIO / GFP_NOFS allocation requests have possibility of hanging up the system. We can reproduce such hang up using artificial stress (e.g. http://lkml.kernel.org/r/201703031948.CHJ81278.VOHSFFFOOLJQMt@xxxxxxxxxxxxxxxxxxx ), but this problem will not be addressed unless it is proven to occur using real workloads. It is a too much request for averaged users to prove that their systems hung up due to this problem. In order to avoid silent hang up, Linux 4.9 got warn_alloc() calls which "synchronously" prints messages when a memory allocation request took more than 10 seconds. But since it was confirmed that concurrent warn_alloc() calls can hang up the system, warn_alloc() was reverted in Linux 4.15-rc1 ( https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/page_alloc.c?id=400e22499dd92613 ). Therefore, unfortunately your kernel does not allow you to check whether memory allocation was making progress or not. I have been proposing a watchdog which extends khungtaskd so that the system can print useful information "asynchronously" without locking up the system (e.g. http://lkml.kernel.org/r/1495331504-12480-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx http://lkml.kernel.org/r/1510833448-19918-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx ). But since OOM livelock is the least attractive domain, I'm stuck with zero advocate. The watchdog did not get in time for obtaining information in your case, sorry. For now, you can try setting /proc/sys/kernel/hung_task_warnings to -1, for the default setting of /proc/sys/kernel/hung_task_warnings is 10 which means that "INFO: task $commname:$pid blocked for more than 120 seconds." is printed for only 10 times (like this report did) and makes it impossible for users to judge whether the hung situation continued or not. There is SysRq-t and SysRq-m, but I don't expect that current SysRq can give you enough information for analyzing this problem. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>