Re: INFO: task hung in sync_blockdev

Jan Kara <jack@xxxxxxx> · Thu, 8 Feb 2018 17:18:21 +0100

On Thu 08-02-18 15:18:11, Dmitry Vyukov wrote:
> On Thu, Feb 8, 2018 at 3:08 PM, Jan Kara <jack@xxxxxxx> wrote:
> > On Thu 08-02-18 14:28:08, Dmitry Vyukov wrote:
> >> On Thu, Feb 8, 2018 at 10:28 AM, Jan Kara <jack@xxxxxxx> wrote:
> >> > On Wed 07-02-18 07:52:29, Andi Kleen wrote:
> >> >> >  #0:  (&bdev->bd_mutex){+.+.}, at: [<0000000040269370>]
> >> >> > __blkdev_put+0xbc/0x7f0 fs/block_dev.c:1757
> >> >> > 1 lock held by blkid/19199:
> >> >> >  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> >> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> >> >> >  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<0000000033edf9f2>]
> >> >> > n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> >> >> > 1 lock held by syz-executor5/19330:
> >> >> >  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> >> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> >> >> > 1 lock held by syz-executor5/19331:
> >> >> >  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
> >> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
> >> >>
> >> >> It seems multiple processes deadlocked on the bd_mutex.
> >> >> Unfortunately there's no backtrace for the lock acquisitions,
> >> >> so it's hard to see the exact sequence.
> >> >
> >> > Well, all in the report points to a situation where some IO was submitted
> >> > to the block device and never completed (more exactly it took longer than
> >> > those 120s to complete that IO). It would need more digging into the
> >> > syzkaller program to find out what kind of device that was and possibly why
> >> > the IO took so long to complete...
> >>
> >>
> >> Would a traceback of all task stacks help in this case?
> >> What I've seen in several "task hung" reports is that the CPU
> >> traceback is not showing anything useful. So perhaps it should be
> >> changed to task traceback? Or it would not help either?
> >
> > Task stack traceback for all tasks (usually only tasks in D state - i.e.
> > sysrq-w - are enough actually) would definitely help for debugging
> > deadlocks on sleeping locks. For this particular case I'm not sure if it
> > would help or not since it is quite possible the IO is just sitting in some
> > queue never getting processed
> 
> That's what I was afraid of.
> 
> > due to some racing syzkaller process tearing
> > down the device in the wrong moment or something like that... Such case is
> > very difficult to debug without full kernel crashdump of the hung kernel
> > (or a reproducer for that matter) and even with that it is usually rather
> > time consuming. But for the deadlocks which do occur more frequently it
> > would be probably worth the time so it would be nice if such option was
> > eventually available.
> 
> By "full kernel crashdump" you mean kdump thing, or something else?

Yes, the kdump thing (for KVM guest you can grab the memory dump also from
the host in a simplier way and it should be usable with the crash utility
AFAIK).

								Honza

-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR