Patch "vdpa/mlx5: Fix hang when cvq commands are triggered during device unregister" has been added to the 6.3-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    vdpa/mlx5: Fix hang when cvq commands are triggered during device unregister

to the 6.3-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     vdpa-mlx5-fix-hang-when-cvq-commands-are-triggered-d.patch
and it can be found in the queue-6.3 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit c0c5a4493604c3929db950e46f9acebf598b7413
Author: Dragos Tatulea <dtatulea@xxxxxxxxxx>
Date:   Tue May 16 12:58:01 2023 +0300

    vdpa/mlx5: Fix hang when cvq commands are triggered during device unregister
    
    [ Upstream commit 73790bdfba076c0886f0f14fd46ff2c70ee31ce9 ]
    
    Currently the vdpa device is unregistered after the workqueue that
    processes vq commands is disabled. However, the device unregister
    process can still send commands to the cvq (a vlan delete for example)
    which leads to a hang because the handing workqueue has been disabled
    and the command never finishes:
    
     [ 2263.095764] rcu: INFO: rcu_sched self-detected stall on CPU
     [ 2263.096307] rcu:        9-....: (5250 ticks this GP) idle=dac4/1/0x4000000000000000 softirq=111009/111009 fqs=2544
     [ 2263.097154] rcu:        (t=5251 jiffies g=393549 q=347 ncpus=10)
     [ 2263.097648] CPU: 9 PID: 94300 Comm: kworker/u20:2 Not tainted 6.3.0-rc6_for_upstream_min_debug_2023_04_14_00_02 #1
     [ 2263.098535] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
     [ 2263.099481] Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
     [ 2263.100143] RIP: 0010:virtnet_send_command+0x109/0x170
     [ 2263.100621] Code: 1d df f5 ff 85 c0 78 5c 48 8b 7b 08 e8 d0 c5 f5 ff 84 c0 75 11 eb 22 48 8b 7b 08 e8 01 b7 f5 ff 84 c0 75 15 f3 90 48 8b 7b 08 <48> 8d 74 24 04 e8 8d c5 f5 ff 48 85 c0 74 de 48 8b 83 f8 00 00 00
     [ 2263.102148] RSP: 0018:ffff888139cf36e8 EFLAGS: 00000246
     [ 2263.102624] RAX: 0000000000000000 RBX: ffff888166bea940 RCX: 0000000000000001
     [ 2263.103244] RDX: 0000000000000000 RSI: ffff888139cf36ec RDI: ffff888146763800
     [ 2263.103864] RBP: ffff888139cf3710 R08: ffff88810d201000 R09: 0000000000000000
     [ 2263.104473] R10: 0000000000000002 R11: 0000000000000003 R12: 0000000000000002
     [ 2263.105082] R13: 0000000000000002 R14: ffff888114528400 R15: ffff888166bea000
     [ 2263.105689] FS:  0000000000000000(0000) GS:ffff88852cc80000(0000) knlGS:0000000000000000
     [ 2263.106404] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     [ 2263.106925] CR2: 00007f31f394b000 CR3: 000000010615b006 CR4: 0000000000370ea0
     [ 2263.107542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     [ 2263.108163] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
     [ 2263.108769] Call Trace:
     [ 2263.109059]  <TASK>
     [ 2263.109320]  ? check_preempt_wakeup+0x11f/0x230
     [ 2263.109750]  virtnet_vlan_rx_kill_vid+0x5a/0xa0
     [ 2263.110180]  vlan_vid_del+0x9c/0x170
     [ 2263.110546]  vlan_device_event+0x351/0x760 [8021q]
     [ 2263.111004]  raw_notifier_call_chain+0x41/0x60
     [ 2263.111426]  dev_close_many+0xcb/0x120
     [ 2263.111808]  unregister_netdevice_many_notify+0x130/0x770
     [ 2263.112297]  ? wq_worker_running+0xa/0x30
     [ 2263.112688]  unregister_netdevice_queue+0x89/0xc0
     [ 2263.113128]  unregister_netdev+0x18/0x20
     [ 2263.113512]  virtnet_remove+0x4f/0x230
     [ 2263.113885]  virtio_dev_remove+0x31/0x70
     [ 2263.114273]  device_release_driver_internal+0x18f/0x1f0
     [ 2263.114746]  bus_remove_device+0xc6/0x130
     [ 2263.115146]  device_del+0x173/0x3c0
     [ 2263.115502]  ? kernfs_find_ns+0x35/0xd0
     [ 2263.115895]  device_unregister+0x1a/0x60
     [ 2263.116279]  unregister_virtio_device+0x11/0x20
     [ 2263.116706]  device_release_driver_internal+0x18f/0x1f0
     [ 2263.117182]  bus_remove_device+0xc6/0x130
     [ 2263.117576]  device_del+0x173/0x3c0
     [ 2263.117929]  ? vdpa_dev_remove+0x20/0x20 [vdpa]
     [ 2263.118364]  device_unregister+0x1a/0x60
     [ 2263.118752]  mlx5_vdpa_dev_del+0x4c/0x80 [mlx5_vdpa]
     [ 2263.119232]  vdpa_match_remove+0x21/0x30 [vdpa]
     [ 2263.119663]  bus_for_each_dev+0x71/0xc0
     [ 2263.120054]  vdpa_mgmtdev_unregister+0x57/0x70 [vdpa]
     [ 2263.120520]  mlx5v_remove+0x12/0x20 [mlx5_vdpa]
     [ 2263.120953]  auxiliary_bus_remove+0x18/0x30
     [ 2263.121356]  device_release_driver_internal+0x18f/0x1f0
     [ 2263.121830]  bus_remove_device+0xc6/0x130
     [ 2263.122223]  device_del+0x173/0x3c0
     [ 2263.122581]  ? devl_param_driverinit_value_get+0x29/0x90
     [ 2263.123070]  mlx5_rescan_drivers_locked+0xc4/0x2d0 [mlx5_core]
     [ 2263.123633]  mlx5_unregister_device+0x54/0x80 [mlx5_core]
     [ 2263.124169]  mlx5_uninit_one+0x54/0x150 [mlx5_core]
     [ 2263.124656]  mlx5_sf_dev_remove+0x45/0x90 [mlx5_core]
     [ 2263.125153]  auxiliary_bus_remove+0x18/0x30
     [ 2263.125560]  device_release_driver_internal+0x18f/0x1f0
     [ 2263.126052]  bus_remove_device+0xc6/0x130
     [ 2263.126451]  device_del+0x173/0x3c0
     [ 2263.126815]  mlx5_sf_dev_remove+0x39/0xf0 [mlx5_core]
     [ 2263.127318]  mlx5_sf_dev_state_change_handler+0x178/0x270 [mlx5_core]
     [ 2263.127920]  blocking_notifier_call_chain+0x5a/0x80
     [ 2263.128379]  mlx5_vhca_state_work_handler+0x151/0x200 [mlx5_core]
     [ 2263.128951]  process_one_work+0x1bb/0x3c0
     [ 2263.129355]  ? process_one_work+0x3c0/0x3c0
     [ 2263.129766]  worker_thread+0x4d/0x3c0
     [ 2263.130140]  ? process_one_work+0x3c0/0x3c0
     [ 2263.130548]  kthread+0xb9/0xe0
     [ 2263.130895]  ? kthread_complete_and_exit+0x20/0x20
     [ 2263.131349]  ret_from_fork+0x1f/0x30
     [ 2263.131717]  </TASK>
    
    The fix is to disable and destroy the workqueue after the device
    unregister. It is expected that vhost will not trigger kicks after
    the unregister. But even if it would, the wq is disabled already by
    setting the pointer to NULL (done so in the referenced commit).
    
    Fixes: ad6dc1daaf29 ("vdpa/mlx5: Avoid processing works if workqueue was destroyed")
    Signed-off-by: Dragos Tatulea <dtatulea@xxxxxxxxxx>
    Message-Id: <20230516095800.3549932-1-dtatulea@xxxxxxxxxx>
    Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
    Reviewed-by: Tariq Toukan <tariqt@xxxxxxxxxx>
    Acked-by: Jason Wang <jasowang@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 97a16f7eb8941..0b228fbb2a68b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3323,10 +3323,10 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
 	mlx5_vdpa_remove_debugfs(ndev->debugfs);
 	ndev->debugfs = NULL;
 	unregister_link_notifier(ndev);
+	_vdpa_unregister_device(dev);
 	wq = mvdev->wq;
 	mvdev->wq = NULL;
 	destroy_workqueue(wq);
-	_vdpa_unregister_device(dev);
 	mgtdev->ndev = NULL;
 }
 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux