On 2020/11/18 上午11:15, Sergey Senozhatsky wrote:
On (20/11/18 11:46), Sergey Senozhatsky wrote:
[..]
Because I'm not sure where the xmit_lock is taken while holding the
target_list_lock.
I don't see where does this happen. It seems to me that the report
is not about broken locking order, but more about:
- soft-irq can be preempted (while holding _xmit_lock) by a hardware
interrupt, that will attempt to acquire the same _xmit_lock lock.
CPU0
<<soft IRQ>>
virtnet_poll_tx()
__netif_tx_lock()
spin_lock(_xmit_lock)
<<hard IRQ>>
add_interrupt_randomness()
crng_fast_load()
printk()
call_console_drivers()
spin_lock_irqsave(&target_list_lock)
spin_lock(_xmit_lock);
Does this make sense?
Hmm, lockdep says something similar, but there are 2 printk()
happening - both on local and remote CPUs.
[ 21.149564] CPU0 CPU1
[ 21.149565] ---- ----
[ 21.149566] lock(_xmit_ETHER#2);
[ 21.149569] local_irq_disable();
[ 21.149570] lock(console_owner);
[ 21.149572] lock(target_list_lock);
[ 21.149575] <Interrupt>
[ 21.149576] lock(console_owner);
This CPU0 lock(_xmit_ETHER#2) -> hard IRQ -> lock(console_owner) is
basically
soft IRQ -> lock(_xmit_ETHER#2) -> hard IRQ -> printk()
Then CPU1 spins on xmit, which is owned by CPU0, CPU0 spins on
console_owner, which is owned by CPU1?
If this is true, it looks not a virtio-net specific issue but somewhere
else.
I think all network driver will synchronize through bh instead of hardirq.
Thanks
A quick-and-dirty idea (it doesn't fix the lockdep report) - can we
add some sort of max_loops variable to console_trylock_spinning(),
so that it will not spin forever in `while (READ_ONCE(console_waiter))`
waiting for a console_owner to pass the lock?
-ss
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization