On 20-10-20, 07:13, Rob Clark wrote: > On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote: > > > > On 20-10-20, 12:56, Daniel Vetter wrote: > > > Yeah that's bad practice. Generally you shouldn't need to hold locks > > > in setup/teardown code, since there's no other thread which can > > > possible hold a reference to anything your touching anymore. Ofc > > > excluding quickly grabbing/dropping a lock to insert/remove objects > > > into lists and stuff. > > > > > > The other reason is that especially with anything related to sysfs or > > > debugfs, the locking dependencies you're pulling in are enormous: vfs > > > locks pull in mm locks (due to mmap) and at that point there's pretty > > > much nothing left you're allowed to hold while acquiring such a lock. > > > For simple drivers this is no issue, but for fancy drivers (like gpu > > > drivers) which need to interact with core mm) this means your > > > subsystem is a major pain to use. > > > > > > Usually the correct fix is to only hold your subsystem locks in > > > setup/teardown when absolutely required, and fix any data > > > inconsistency issues by reordering your setup/teardown code: When you > > > register as the last step and unregister as the first step, there's no > > > need for any additional locking. And hence no need to call debugfs > > > functions while holding your subsystem locks. > > > > > > The catch phrase I use for this is "don't solve object lifetime issues > > > with locking". Instead use refcounting and careful ordering in > > > setup/teardown code. > > > > This is exactly what I have done in the OPP core, the locks were taken > > only when really necessary, though as we have seen now I have missed > > that at a single place and that should be fixed as well. Will do that, > > thanks. > > I do have an easy enough way to repro the issue, so if you have a > patch I can certainly test it. Does this fix it for you ? There is one more place still left where we are taking the opp_table_lock while adding stuff to debugfs and that's not that straight forward to fix. But I didn't see that path in your circular dependency trace, so who knows :) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 2483e765318a..4cc0fb716381 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1181,6 +1181,10 @@ static void _opp_table_kref_release(struct kref *kref) struct opp_device *opp_dev, *temp; int i; + /* Drop the lock as soon as we can */ + list_del(&opp_table->node); + mutex_unlock(&opp_table_lock); + _of_clear_opp_table(opp_table); /* Release clk */ @@ -1208,10 +1212,7 @@ static void _opp_table_kref_release(struct kref *kref) mutex_destroy(&opp_table->genpd_virt_dev_lock); mutex_destroy(&opp_table->lock); - list_del(&opp_table->node); kfree(opp_table); - - mutex_unlock(&opp_table_lock); } void dev_pm_opp_put_opp_table(struct opp_table *opp_table) -- viresh