Re: [PATCH v3 for-next 02/33] IB/core: Add kref to IB devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 4/28/2015 8:43 PM, Jason Gunthorpe wrote:
On Tue, Apr 28, 2015 at 05:03:11PM +0300, Matan Barak wrote:

The cleanup of roce_gid_cache is done in a different context, so we
need to make sure the device is still alive while doing so.

This explanation doesn't look right to me. I don't see anything like
that under roce_gid_cache_cleanup_one ?

Although, there must be a call to the driver's modify_gid to free
context before freeing, and I don't see that obviously happening..


Of course there is:
roce_gid_cache_client_cleanup_one -> roce_gid_cache_client_cleanup_one_work -> *work* -> roce_gid_cache_client_cleanup_work_handler -> roce_gid_cache_cleanup_one -> free_roce_gid_cache -> write_gid -> modify_gid

However, all the other async work launched doesn't look safe at all.


I think it's safe - roce_gid_cache_client_cleanup_one deactivates the cache (no new events are handled on the cache). Ongoing events are flushed in roce_gid_cache_client_cleanup_one_work (flush_workqueue(roce_gid_mgmt_wq)). When roce_gid_cache_cleanup_one is called - no work will be done or is currently in process on this cache.

So, did you mean that the device must still be alive while all the
other work is running? And the point of this scheme is to guarentee
all the work is flushed? (at least I hope it is, otherwise there are
bigger problems here)


It's not safe to free the client's context in ib_unregister_device while the client isn't done. The obvious solution is to wait in client->remove (like you suggested) until the client has finished cleaning up things. This doesn't fit our case - since client->remove is called under device_lock, but it's possible (for example) that roce_rescan_device_work_handler is currently running and waits to grab this exact mutex - DEAD LOCK. Freeing thing asynchronously is a possible solution - we don't lock all IB devices but just our device. Moreover, it's also more future proof as other clients might want to run other things concurrently.

It is just fundamentally wrong to return from ib_client.remove while
async work is still outstanding, the client is expected to deal with
this internally and synchronously.

You don't need IB core help to do this.


netdev is taking a similar approach - please take a look at netdev_wait_allrefs

Or: This should have been fixed after Haggai brought it up...

Jason


Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux