On Fri, May 01, 2015 at 09:28:40AM +0300, Matan Barak wrote: > >> There might be some ways around it - for example, introduce another > >> workqueue for roce_rescan_device and flush this workqueue only. > >> Every way has its advantages and disadvantages. > > > > I don't see that either, like I keep saying, all work must complete, > > more work queues don't really seem to help that. > > > > I think this is a possible solution, as all works but the "rescan" work > aren't device specific. That means that after we move the rescan work > to another workqueue and synchronize it in the client->remove function, > we could be sure the device won't be used by this client anymore. > > The rest of the works only iterate on the *existing* device list. Since the > ib_unregister_device > (a) lock (b) iterate over client->remove (c) remove from list (d) free > (e) unlock > all roce_gid_cache's works won't find it in the list and we'll be safe. Well.. be carefull with the locking here, if device_mutex no longer covers remove() then remove() and ib_enum_roce_ports_of_netdev() can run concurrently and they will need locking against free_roce_gid_cache(), which appears to implicitly rely on the device_mutex (too subtle and fragile) If there is a 'global' part and a 'device' specific part, it would good to make this more obvious and clear: group the functions together, annotate the difference with naming, a comment, or something. There are *alot* of little functions in this module, it is hard to understand the call graph without careful study. > Regarding ib_unregister_client - if we guarantee that after each > client->remove(dev) > dev isn't used in this client, we actually guarantee that after removing all IB > devices no IB device will be used (induction). Yes, that is the idea.. > > So, ultimtely, this is really a self created problem, and infact, the > > problem lies with the introduction of ib_enum_roce_ports_of_netdev - > > adding a new use of the device_mutex that is not register/unregister > > related exposes the design limitations of that mutex *that Roland > > clearly explained in a giant comment!* > I agree, while it's a general problem - it was first introduced by using > device_mutex in an asynchronous context (that should be flushed in > the remove function). Well, no, generally speaking you can't use device_mutex in any client facing API like ib_enum_roce_ports_of_netdev - it isn't just async contexts, but it means the add() and remove() functions instantly deadlock if they call a client facing API - that is not sane. > > - Convert device_mutex into a r/w sem > > I'm not sure this will solve the problem, as besides the new enumerate > devices function, all existing functions update the list - so we'll have > read-while-write scenario and we'll be in the exact same condition. You'd do something like hold the write sem and mutate the device list to unlink the target device, then switch to a read sem to traverse the client list. Two locks are clearer if contention is not an issue. > Agree, but please consider also the addition of another > workqueue as a possible solution. It *does* (seem) to answer > all of your concerns and could be safer and cause less code changes. As above, fundamentally, introducing a client API that holds device_mutex is just wrong, so we need to address this. I would also recommend splitting your per-device/global stuff anyhow, global stuff needs a clear unwind during remove, and so on. It isn't really optional to have this clarity. > kref just hides an atomic refcount - so you're actually saying using > the abstraction contradicts the kernel object's lifetime strategy? kref has specific semantic meaning, it isn't a general use data structure. > I tend to agree. The __exit function flushes the workqueue (so eventually > we guarentee that no code will execute before the module is unloaded). > IMHO, after ib_unregister_client returns, the module could still run code > (this is why the __exit handler exists), but it's not allowed to use any > data of the module which unregistered it, meaning - we can't allow it to > use any IB device. Yes, I was being terse, of course the .text is still going to be used, but the general idea of lifetime holds: the .text cannot remain in service of the ib_device because the __exit handler cannot clean that up. > Ok, I get your point, thanks. I'll keep that call synchronous. The two > options which are on the table right now are: > (a) split the device_mutex to device_mutex and client_mutex. > ib_unregister_device first grabs only client_mutex and after it called > client->remove it grabs device->mutex to the rest of the work. > (b) Introduce another workqueue for all device-specific events > (actually, only rescan). > This allows us to flush only this workqueue without waiting for any > work which needs device_mutex. The other option is to make rescan sense that the ib_device has been freed and ignore it, and then flush the work queue that runs that on module __exit. This might be needed anyhow due to the free race discussed above. Be sure to add the necessary kref get/puts to hold the ib device though. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html