Re: netconsole deadlock with virtnet

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/11/24 下午4:01, Leon Romanovsky wrote:
On Tue, Nov 24, 2020 at 11:22:03AM +0800, Jason Wang wrote:
On 2020/11/24 上午3:21, Jakub Kicinski wrote:
On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote:
On Mon, 23 Nov 2020 10:52:52 -0800
Jakub Kicinski <kuba@xxxxxxxxxx> wrote:

On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote:
On Mon, 23 Nov 2020 13:08:55 +0200
Leon Romanovsky <leon@xxxxxxxxxx> wrote:

   [   10.028024] Chain exists of:
   [   10.028025]   console_owner --> target_list_lock --> _xmit_ETHER#2
Note, the problem is that we have a location that grabs the xmit_lock while
holding target_list_lock (and possibly console_owner).
Well, it try_locks the xmit_lock. Does lockdep understand try-locks?

(not that I condone the shenanigans that are going on here)
Does it?

	virtnet_poll_tx() {
		__netif_tx_lock() {
			spin_lock(&txq->_xmit_lock);
Umpf. Right. I was looking at virtnet_poll_cleantx()

That looks like we can have:


	CPU0		CPU1
	----		----
     lock(xmit_lock)

		    lock(console)
		    lock(target_list_lock)
		    __netif_tx_lock()
		        lock(xmit_lock);

			[BLOCKED]

     <interrupt>
     lock(console)

     [BLOCKED]



   DEADLOCK.


So where is the trylock here?

Perhaps you need the trylock in virtnet_poll_tx()?
That could work. Best if we used normal lock if !!budget, and trylock
when budget is 0. But maybe that's too hairy.

If we use trylock, we probably lose(or delay) tx notification that may have
side effects to the stack.


I'm assuming all this trickiness comes from virtqueue_get_buf() needing
locking vs the TX path? It's pretty unusual for the completion path to
need locking vs xmit path.

Two reasons for doing this:

1) For some historical reason, we try to free transmitted tx packets in xmit
(see free_old_xmit_skbs() in start_xmit()), we can probably remove this if
we remove the non tx interrupt mode.
2) virtio core requires virtqueue_get_buf() to be synchronized with
virtqueue_add(), we probably can solve this but it requires some non trivial
refactoring in the virtio core
So how will we solve our lockdep issues?

Thanks


It's not clear to me that whether it's a virtio-net specific issue. E.g the above deadlock looks like a generic issue so workaround it via virtio-net may not help for other drivers.

Thanks



Btw, have a quick search, there are several other drivers that uses tx lock
in the tx NAPI.

Thanks


_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux