Re: [PATCH v3 for-next 02/33] IB/core: Add kref to IB devices

Matan Barak <matanb@xxxxxxxxxxxx> · Thu, 30 Apr 2015 11:21:01 +0300

On 4/29/2015 7:48 PM, Jason Gunthorpe wrote:

Although, there must be a call to the driver's modify_gid to free
context before freeing, and I don't see that obviously happening..

Of course there is:
roce_gid_cache_client_cleanup_one ->
roce_gid_cache_client_cleanup_one_work -> *work* ->
roce_gid_cache_client_cleanup_work_handler ->
roce_gid_cache_cleanup_one -> free_roce_gid_cache -> write_gid ->
modify_gid

Ah, this wasn't there in the earlier versions. Good.

However, all the other async work launched doesn't look safe at all.

I think it's safe - roce_gid_cache_client_cleanup_one deactivates
the cache (no new events are handled on the cache). Ongoing events
are flushed in roce_gid_cache_client_cleanup_one_work

Do you mean roce_gid_cache_client_cleanup_work_handler?

Yeah, the flush_workqueue is done in 
roce_gid_cache_client_cleanup_work_handler

It's not safe to free the client's context in ib_unregister_device
while the client isn't done. The obvious solution is to wait in
client->remove (like you suggested) until the client has finished
cleaning up things.

This isn't just the obvious solution, it is the *expected* solution.
In the kernel the add/remove style idiom always works like this.

This doesn't fit our case - since client->remove is called under
device_lock, but it's possible (for example) that
roce_rescan_device_work_handler is currently running and waits to
grab this exact mutex - DEAD LOCK.

Uh, client->remove is most obviously called with
drivers/infiniband/core/device.c:device_mutex held, is that what you
mean? But that can't be right because only four functions hold that
lock and none of them are obviously called from this patch?

If these patches have a locking problem then breaking the add/remove
idiom is not the way to solve it.

Look, four people have asked about this patch, and I still have yet to
see an accurate and convincing answer from you what is actually going
on here. Please actually spend some time to properly research and
describe why the remove callback can't be synchronous.

ib_unregister_device calls the remove function with device_mutex help. 
In addition, ib_enum_roce_ports_of_netdev does the same. Every 
interesting netdev/inet/inet6 event that's handled in roce_gid_mgmt 
triggers ib_enum_roce_ports_of_netdev by using the workqueue (for 
example, 
netdevice_event->*work*->netdevice_event_work_handler->ib_enum_roce_ports_of_netdev).

When a device is being unregistered, the remove function is called under 
device_mutex:
ib_unregister_device->roce_gid_cache_client_cleanup_one->*work*->roce_gid_cache_client_cleanup_work_handler->flush_workqueue

This flush_workqueue could wait for ib_enum_roce_ports_of_netdev. If we 
would have called it straight from the remove function, we're waiting 
for a work which waits for a mutex that will be unlocked only after the 
iteration over all remove functions is completed -> *DEADLOCK*

You could argue that flush_workqueue isn't needed, but that let's look 
at the following flow:

roce_gid_cache_client_setup_one->roce_rescan_device->*work (with the 
exact ib_dev)*->....
We need to make sure the ib_dev isn't free'd before this work is done.

There might be some ways around it - for example, introduce another 
workqueue for roce_rescan_device and flush this workqueue only. Every 
way has its advantages and disadvantages. I think it's problematic that 
device_mutex can't be held in a work as *most* client works are 
synchronized when the a device is being unregistered. It could affect 
future clients as well.

I'll be happy to explain/fix any issue you have regarding this code, but 
of course it needs to be concrete.

It is just fundamentally wrong to return from ib_client.remove while
async work is still outstanding, the client is expected to deal with
this internally and synchronously.

You don't need IB core help to do this.

netdev is taking a similar approach - please take a look at
netdev_wait_allrefs

No, it really isn't, from the attached drivers perspective after
unregister_netdevice returns the driver is not allowed to operate the
net device anymore. netdev_wait_allrefs is part of making
unregister_netdevice synchronous.

The same is true of our IB client attaches, after a client returns
from remove it is not allowed to operate the IB device anymore.

ib_unregister_device is synchronous in the exact same manner - after it 
returns, no client operate on the IB device. 
wait_for_completion(&device->free) was added for this exact reason.

That is a standard idiom, and we'd need a huge compelling reason to go
away from that.

A device can be remove if and only if it's reference count is zero. 
That's the only point where we guarantee nobody uses it anymore. That's 
a standard idiom as well.

Jason

I really appreciate the review and thanks for looking at this patch,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html