On Wed 2021-10-27 13:57:40, Miroslav Benes wrote: > On Tue, 26 Oct 2021, Luis Chamberlain wrote: > > > On Tue, Oct 26, 2021 at 11:37:30PM +0800, Ming Lei wrote: > > > On Tue, Oct 26, 2021 at 10:48:18AM +0200, Petr Mladek wrote: > > > > Livepatch code never called kobject_del() under a lock. It would cause > > > > the obvious deadlock. I have to correct myself. IMHO, the deadlock is far from obvious. I always get lost in the code and the documentation is not clear. I always get lost. > > > > Never? > > kobject_put() to be precise. IMHO, the problem is actually with kobject_del() that gets blocked until the sysfs interface gets removed. kobject_put() will have the same problem only when the clean up is not delayed. > When I started working on the support for module/live patches removal, > calling kobject_put() under our klp_mutex lock was the obvious first > choice given how the code was structured, but I ran into problems with > deadlocks immediately. So it was changed to async approach with the > workqueue. Thus the mainline code has never suffered from this, but we > knew about the issues. > > > > > The historic code only waited in the > > > > module_exit() callback until the sysfs interface was removed. > > > > > > OK, then Luis shouldn't consider livepatching as one such issue to solve > > > with one generic solution. > > > > It's not what I was told when the deadlock was found with zram, so I was > > informed quite the contrary. > > >From my perspective, it is quite easy to get it wrong due to either a lack > of generic support, or missing rules/documentation. So if this thread > leads to "do not share locks between a module removal and a sysfs > operation" strict rule, it would be at least something. In the same > manner as Luis proposed to document try_module_get() expectations. The rule "do not share locks between a module removal and a sysfs operation" is not clear to me. IMHO, there are the following rules: 1. rule: kobject_del() or kobject_put() must not be called under a lock that is used by store()/show() callbacks. reason: kobject_del() waits until the sysfs interface is destroyed. It has to wait until all store()/show() callbacks are finished. 2. rule: kobject_del()/kobject_put() must not be called from the related store() callbacks. reason: same as in 1st rule. 3. rule: module_exit() must wait until all release() callbacks are called when kobject are static. reason: kobject_put() must be called to clean up internal dependencies. The clean up might be done asynchronously and need access to the kobject structure. Best Regards, Petr PS: I am sorry if I am messing things. I want to be sure that we are all talking about the same and understand it the same way.