Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/1/5 22:19, Jason Gunthorpe wrote:
On Fri, Jan 05, 2024 at 04:15:18PM +0800, Shifeng Li wrote:
On 2024/1/4 20:37, Jason Gunthorpe wrote:
On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote:

The root cause is that mad_client and cm_client may init concurrently
when devices_rwsem write semaphore is downgraded in enable_device_and_get() like:

That can't be true, the module loader infrastructue ensures those two
things are sequential.


I'm a bit confused how the module loader infrastructue ensures that mad_client.add() and
cm_client.add() are sequential. Could you explain in more detail
please?

ib_cm has a symbol dependency on ib_mad, so the module loader will not
allow ib_cm to start running until all its symbol dependencies have
completed loading.

I have found a method to reproduce the issue as follow:

1. modprobe -r ib_cm; modprobe -r ib_core; modprobe -r mlx5_ib;
2. Compile and replace ib_core with following patch;
3. modprobe ib_cm;
4. modprobe mlx5_ib;

diff -Narup ./drivers/infiniband/core/device.c ./drivers/infiniband/core/device.c.reproduce
--- ./drivers/infiniband/core/device.c  2024-01-15 11:14:08.063094430 +0800
+++ ./drivers/infiniband/core/device.c.reproduce        2024-01-15 11:16:22.577096953 +0800
@@ -64,6 +64,8 @@ struct workqueue_struct *ib_wq;
 EXPORT_SYMBOL_GPL(ib_wq);
 static struct workqueue_struct *ib_unreg_wq;
+int ib_cm_run_flag = 0;
+
 /*
  * Each of the three rwsem locks (devices, clients, client_data) protects the
  * xarray of the same name. Specifically it allows the caller to assert that
@@ -1371,6 +1373,9 @@ static int enable_device_and_get(struct
         */
        downgrade_write(&devices_rwsem);
+ ib_cm_run_flag = 1;
+       msleep(30000);
+
        if (device->ops.enable_driver) {
                ret = device->ops.enable_driver(device);
                if (ret)
@@ -1843,6 +1848,12 @@ int ib_register_client(struct ib_client
        if (ret)
                return ret;
+ if (strncmp(client->name, "cm", strlen("cm")) == 0) {
+               while (!ib_cm_run_flag) {
+                       cond_resched();
+               }
+       }
+
        down_read(&devices_rwsem);
        xa_for_each_marked (&devices, index, device, DEVICE_REGISTERED) {
                ret = add_client_context(device, client);


We know that the ib_cm driver and mlx5_ib driver can load concurrently.

Yes, this is possible

Jason






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux