Re: RE: Network do not works with linux >= 6.1.2. Issue bisected to "425c9bd06b7a70796d880828d15c11321bdfb76d" (RDMA/irdma: Report the correct link speed)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2023/1/10 3:36, Saleem, Shiraz 写道:
Subject: Re: Network do not works with linux >= 6.1.2. Issue bisected to
"425c9bd06b7a70796d880828d15c11321bdfb76d" (RDMA/irdma: Report the
correct link speed)

On Fri, Jan 06, 2023 at 08:55:29AM +0100, Jaroslav Pulchart wrote:
[  257.967099] task:NetworkManager  state:D stack:0     pid:3387
ppid:1      flags:0x00004002
[  257.975446] Call Trace:
[  257.977901]  <TASK>
[  257.980004]  __schedule+0x1eb/0x630 [  257.983498]
schedule+0x5a/0xd0 [  257.986641]  schedule_timeout+0x11d/0x160 [
257.990654]  __wait_for_common+0x90/0x1e0 [  257.994666]  ?
usleep_range_state+0x90/0x90 [  257.998854]
__flush_workqueue+0x13a/0x3f0 [  258.002955]  ?
__kernfs_remove.part.0+0x11e/0x1e0
[  258.007661]  ib_cache_cleanup_one+0x1c/0xe0 [ib_core] [
258.012721]  __ib_unregister_device+0x62/0xa0 [ib_core] [  258.017959]
ib_unregister_device+0x22/0x30 [ib_core] [  258.023024]
irdma_remove+0x1a/0x60 [irdma] [  258.027223]
auxiliary_bus_remove+0x18/0x30 [  258.031414]
device_release_driver_internal+0x1aa/0x230
[  258.036643]  bus_remove_device+0xd8/0x150 [  258.040654]
device_del+0x18b/0x3f0 [  258.044149]  ice_unplug_aux_dev+0x42/0x60
[ice]

We talked about this already - wasn't it on this series?

This is yet another path (when ice ports are added to a bond) I believe where the RDMA aux device
is removed holding the RTNL lock. It's being exposed now with this recent irdma patch - 425c9bd06b7a,
causing a deadlock.

ice_lag_event_handler [rtnl_lock]
  ->ice_lag_changeupper_event
      ->ice_unplug_aux_dev
         ->irdma_remove
             ->ib_unregister_device
                ->ib_cache_cleanup_one
                   ->flush_workqueue(ib)
                      ->irdma_query_port
                          -> ib_get_eth_speed [rtnl_lock]

Agree with the above analysis.
Maybe a quick and direct fix is like this.

@@ -74,6 +74,7 @@ static int irdma_query_port(struct ib_device *ibdev, u32 port,
 {
        struct irdma_device *iwdev = to_iwdev(ibdev);
        struct net_device *netdev = iwdev->netdev;
+       bool unlock_rtnl = false;

        /* no need to zero out pros here. done by caller */

@@ -91,9 +92,16 @@ static int irdma_query_port(struct ib_device *ibdev, u32 port,
                props->phys_state = IB_PORT_PHYS_STATE_DISABLED;
        }

+       if (rtnl_is_locked()) {
+               rtnl_unlock();
+               unlock_rtnl = true;
+       }
        ib_get_eth_speed(ibdev, port, &props->active_speed,
                         &props->active_width);

+       if (unlock_rtnl) {
+               rtnl_lock();
+       }
        if (rdma_protocol_roce(ibdev, 1)) {
                props->gid_tbl_len = 32;
                props->ip_gids = true;

Zhu Yanjun


Previous discussion was on ethtool channel config change, https://lore.kernel.org/linux-rdma/Y5ES3kmYSINlAQhz@x130/,
which David E. is taking care of.

We are working on a patch for this issue.


Don't hold locks when removing aux devices.

[  258.048707]  ice_lag_changeupper_event+0x287/0x2a0 [ice] [
258.054038]  ice_lag_event_handler+0x51/0x130 [ice] [  258.058930]
raw_notifier_call_chain+0x41/0x60 [  258.063381]
__netdev_upper_dev_link+0x1a0/0x370
[  258.068008]  netdev_master_upper_dev_link+0x3d/0x60
[  258.072886]  bond_enslave+0xd16/0x16f0 [bonding] [  258.077517]  ?
nla_put+0x28/0x40 [  258.080756]  do_setlink+0x26c/0xc10 [
258.084249]  ? avc_alloc_node+0x27/0x180 [  258.088173]  ?
__nla_validate_parse+0x141/0x190 [  258.092708]
__rtnl_newlink+0x53a/0x620 [  258.096549]  rtnl_newlink+0x44/0x70

Especially not the rtnl.

Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux