Re: [PATCH net v2] virtio-net: fix possible dim status unrecoverable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2024/3/28 下午6:34, Paolo Abeni 写道:
On Tue, 2024-03-26 at 14:25 +0800, Heng Qi wrote:
When the dim worker is scheduled, if it fails to acquire the lock,
dim may not be able to return to the working state later.

For example, the following single queue scenario:
   1. The dim worker of rxq0 is scheduled, and the dim status is
      changed to DIM_APPLY_NEW_PROFILE;
   2. The ethtool command is holding rtnl lock;
   3. Since the rtnl lock is already held, virtnet_rx_dim_work fails
      to acquire the lock and exits;

Then, even if net_dim is invoked again, it cannot work because the
state is not restored to DIM_START_MEASURE.

Patch has been tested on a VM with 16 NICs, 128 queues per NIC
(2kq total):
With dim enabled on all queues, there are many opportunities for
contention for RTNL lock, and this patch introduces no visible hotspots.
The dim performance is also stable.

Fixes: 6208799553a8 ("virtio-net: support rx netdim")
Signed-off-by: Heng Qi <hengqi@xxxxxxxxxxxxxxxxx>
Acked-by: Jason Wang <jasowang@xxxxxxxxxx>
---
v1->v2:
   - Update commit log. No functional changes.

  drivers/net/virtio_net.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c22d111..0ebe322 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work)
  	struct dim_cq_moder update_moder;
  	int i, qnum, err;
- if (!rtnl_trylock())
+	if (!rtnl_trylock()) {
+		schedule_work(&dim->work);
  		return;
I'm really scared by this change. VMs are (increasingly) used to run
containers orchestration, which in turns puts a lot of pressure on the
RTNL lock. Any rtnl_trylock+ reschedule may hang for a very long time.
Addressing this kind of issues later becomes _extremely_ painful, see:

https://lore.kernel.org/netdev/20231018154804.420823-1-atenart@xxxxxxxxxx/

I really think a different solution is needed. What about moving
virtnet_send_command() under protection of a new mutex?

Daniel did additional work:

https://lore.kernel.org/all/20240328044715.266641-1-danielj@xxxxxxxxxx/

Use spin lock to protect ctrlq access, therefore, rtnl lock can be removed in rx_dim_work,
which will make the problem non-existent.

Thanks,
Heng


I understand it will complicate future hardening works around cvq, but
really rtnl_trylock()/<spin/retry> is bad for the whole system.

Cheers,

Paolo





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux