Re: Google Chrome cause locks held in system (kernel 4.15 rc2)

mikhail <mikhail.v.gavrilov@xxxxxxxxx> · Sat, 09 Dec 2017 18:17:18 +0500

On Fri, 2017-12-08 at 19:18 +0900, Tetsuo Handa wrote:
> Darrick J. Wong wrote:
> > On Fri, Dec 08, 2017 at 08:50:38AM +0500, mikhail wrote:
> > > Hi,
> > > 
> > > can anybody said what here happens?
> > > And which info needed for fixing it?
> > > Thanks.
> > > 
> > > [16712.376081] INFO: task tracker-store:27121 blocked for more
> > > than 120
> > > seconds.
> > > [16712.376088]       Not tainted 4.15.0-rc2-amd-vega+ #10
> > > [16712.376092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables this message.
> > > [16712.376095] tracker-store   D13400 27121   1843 0x00000000
> > > [16712.376102] Call Trace:
> > > [16712.376114]  ? __schedule+0x2e3/0xb90
> > > [16712.376123]  ? wait_for_completion+0x146/0x1e0
> > > [16712.376128]  schedule+0x2f/0x90
> > > [16712.376132]  schedule_timeout+0x236/0x540
> > > [16712.376143]  ? mark_held_locks+0x4e/0x80
> > > [16712.376147]  ? _raw_spin_unlock_irq+0x29/0x40
> > > [16712.376153]  ? wait_for_completion+0x146/0x1e0
> > > [16712.376158]  wait_for_completion+0x16e/0x1e0
> > > [16712.376162]  ? wake_up_q+0x70/0x70
> > > [16712.376204]  ? xfs_buf_read_map+0x134/0x2f0 [xfs]
> > > [16712.376234]  xfs_buf_submit_wait+0xaf/0x520 [xfs]
> > 
> > Stuck waiting for a directory block to read.  Slow disk?  Bad
> > media?
> > 
> 
> Most likely cause is that I/O was getting very slow due to memory
> pressure.
> Running memory consuming processes (e.g. web browsers) and file
> writing
> processes might generate stresses like this report.
> 
> I can't tell whether this report is a real deadlock/lockup or just a
> slowdown,
> for currently we don't have means for checking whether memory
> allocation was
> making progress or not.

It not just slowdown because after 5 hours I was still unable launch even htop.After executing command was nothing happens. I was even surprised that
dmesg could work.

> The OOM killer is not invoked for allocation requests without
> __GFP_FS flag.
> Therefore, GFP_NOIO / GFP_NOFS allocation requests have possibility
> of hanging
> up the system. We can reproduce such hang up using artificial stress
> (e.g.
> http://lkml.kernel.org/r/201703031948.CHJ81278.VOHSFFFOOLJQMt@I-love.
> SAKURA.ne.jp ),
> but this problem will not be addressed unless it is proven to occur
> using real
> workloads. It is a too much request for averaged users to prove that
> their systems
> hung up due to this problem.
> 
> In order to avoid silent hang up, Linux 4.9 got warn_alloc() calls
> which
> "synchronously" prints messages when a memory allocation request took
> more than
> 10 seconds. But since it was confirmed that concurrent warn_alloc()
> calls can
> hang up the system, warn_alloc() was reverted in Linux 4.15-rc1
> ( https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> commit/mm/page_alloc.c?id=400e22499dd92613 ).
> Therefore, unfortunately your kernel does not allow you to check
> whether memory
> allocation was making progress or not.
> 
> I have been proposing a watchdog which extends khungtaskd so that the
> system can
> print useful information "asynchronously" without locking up the
> system (e.g.
> http://lkml.kernel.org/r/1495331504-12480-1-git-send-email-penguin-ke
> rnel@xxxxxxxxxxxxxxxxxxx
> http://lkml.kernel.org/r/1510833448-19918-1-git-send-email-penguin-ke
> rnel@xxxxxxxxxxxxxxxxxxx ).
> But since OOM livelock is the least attractive domain, I'm stuck with
> zero advocate.
> The watchdog did not get in time for obtaining information in your
> case, sorry.
> 
> For now, you can try setting /proc/sys/kernel/hung_task_warnings to
> -1, for the
> default setting of /proc/sys/kernel/hung_task_warnings is 10 which
> means that
> "INFO: task $commname:$pid blocked for more than 120 seconds." is
> printed for
> only 10 times (like this report did) and makes it impossible for
> users to judge
> whether the hung situation continued or not. There is SysRq-t and
> SysRq-m, but I
> don't expect that current SysRq can give you enough information for
> analyzing
> this problem.
> 

Thanks for the advice.
Decided to check what happens when I do SysRq-t.
SysRq-t produce a lot of the output even without running Google Chrome.
Such amout of data does not fit in the kernel output buffer and it's
impossible to read from the screen.

Demonstration: https://youtu.be/DUWB1WGBog0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>