On Tue, Nov 24, 2020 at 11:22:03AM +0800, Jason Wang wrote: > > On 2020/11/24 上午3:21, Jakub Kicinski wrote: > > On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote: > > > On Mon, 23 Nov 2020 10:52:52 -0800 > > > Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > > > > > > On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote: > > > > > On Mon, 23 Nov 2020 13:08:55 +0200 > > > > > Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > > > > > > > > > > [ 10.028024] Chain exists of: > > > > > > [ 10.028025] console_owner --> target_list_lock --> _xmit_ETHER#2 > > > > > Note, the problem is that we have a location that grabs the xmit_lock while > > > > > holding target_list_lock (and possibly console_owner). > > > > Well, it try_locks the xmit_lock. Does lockdep understand try-locks? > > > > > > > > (not that I condone the shenanigans that are going on here) > > > Does it? > > > > > > virtnet_poll_tx() { > > > __netif_tx_lock() { > > > spin_lock(&txq->_xmit_lock); > > Umpf. Right. I was looking at virtnet_poll_cleantx() > > > > > That looks like we can have: > > > > > > > > > CPU0 CPU1 > > > ---- ---- > > > lock(xmit_lock) > > > > > > lock(console) > > > lock(target_list_lock) > > > __netif_tx_lock() > > > lock(xmit_lock); > > > > > > [BLOCKED] > > > > > > <interrupt> > > > lock(console) > > > > > > [BLOCKED] > > > > > > > > > > > > DEADLOCK. > > > > > > > > > So where is the trylock here? > > > > > > Perhaps you need the trylock in virtnet_poll_tx()? > > That could work. Best if we used normal lock if !!budget, and trylock > > when budget is 0. But maybe that's too hairy. > > > If we use trylock, we probably lose(or delay) tx notification that may have > side effects to the stack. > > > > > > I'm assuming all this trickiness comes from virtqueue_get_buf() needing > > locking vs the TX path? It's pretty unusual for the completion path to > > need locking vs xmit path. > > > Two reasons for doing this: > > 1) For some historical reason, we try to free transmitted tx packets in xmit > (see free_old_xmit_skbs() in start_xmit()), we can probably remove this if > we remove the non tx interrupt mode. > 2) virtio core requires virtqueue_get_buf() to be synchronized with > virtqueue_add(), we probably can solve this but it requires some non trivial > refactoring in the virtio core So how will we solve our lockdep issues? Thanks > > Btw, have a quick search, there are several other drivers that uses tx lock > in the tx NAPI. > > Thanks > > > > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization