Re: netconsole deadlock with virtnet

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 19, 2020 at 01:55:53PM +0100, Petr Mladek wrote:
> On Tue 2020-11-17 09:33:25, Steven Rostedt wrote:
> > On Tue, 17 Nov 2020 12:23:41 +0200
> > Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >
> > > Hi,
> > >
> > > Approximately two weeks ago, our regression team started to experience those
> > > netconsole splats. The tested code is Linus's master (-rc4) + netdev net-next
> > > + netdev net-rc.
> > >
> > > Such splats are random and we can't bisect because there is no stable reproducer.
> > >
> > > Any idea, what is the root cause?
> > >
> > > [   21.149739]                       __do_sys_finit_module+0xbc/0x12c
> > > [   21.149740]                       __arm64_sys_finit_module+0x28/0x34
> > > [   21.149741]                       el0_svc_common.constprop.0+0x84/0x200
> > > [   21.149742]                       do_el0_svc+0x2c/0x90
> > > [   21.149743]                       el0_svc+0x18/0x50
> > > [   21.149744]                       el0_sync_handler+0xe0/0x350
> > > [   21.149745]                       el0_sync+0x158/0x180
> > > [   21.149746]  }
> > > [   21.149747]  ... key      at: [<ffff8000093d4018>] target_list_lock+0x18/0xfffffffffffff000 [netconsole]
> > > [   21.149748]  ..
> > > [   21.149750] Lost 190 message(s)!
> >
> > It really sucks that we lose 190 messages that would help to decipher this
> > more. :-p
>
> The message commes from the printk_safe code. The size can be
> increased by CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT.
>
> > Because I'm not sure where the xmit_lock is taken while holding the
> > target_list_lock. But the above does show that printk() calls write_msg()
> > while holding the console_lock, and write_msg() takes the target_list_lock.
> >
> > Thus, the fix would ether require disabling interrupts every time the
> > xmit_lock is taken, or to get it from being taken while holding the
> > target_list_lock.
>
> It seems that the missing messages might help to find the root of
> the problem.

Sorry for not being very responsive, I was in internet-free zone :).

I'll increase CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT from 13 to be 26, let's
see what night run will give us.

Thanks

>
> Best Regards,
> Petr
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization



[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux