On 06/28/2013 01:14 PM, Sergey Senozhatsky wrote: > On (06/28/13 10:13), Viresh Kumar wrote: >> On 26 June 2013 02:45, Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx> wrote: >>> >>> [ 60.277396] ====================================================== >>> [ 60.277400] [ INFO: possible circular locking dependency detected ] >>> [ 60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted >>> [ 60.277411] ------------------------------------------------------- >>> [ 60.277417] bash/2225 is trying to acquire lock: >>> [ 60.277422] ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280 >>> [ 60.277444] >>> but task is already holding lock: >>> [ 60.277449] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60 >>> [ 60.277465] >>> which lock already depends on the new lock. >> >> Hi Sergey, >> >> Can you try reverting this patch? >> >> commit 2f7021a815f20f3481c10884fe9735ce2a56db35 >> Author: Michael Wang <wangyun@xxxxxxxxxxxxxxxxxx> >> Date: Wed Jun 5 08:49:37 2013 +0000 >> >> cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work() >> > > Hello, > Yes, this helps, of course, but at the same time it returns the previous > problem -- preventing cpu_hotplug in some places. > > > I have a bit different (perhaps naive) RFC patch and would like to hear > comments. > > > > The idead is to brake existing lock dependency chain by not holding > cpu_hotplug lock mutex across the calls. In order to detect active > refcount readers or active writer, refcount now may have the following > values: > > -1: active writer -- only one writer may be active, readers are blocked > 0: no readers/writer >> 0: active readers -- many readers may be active, writer is blocked > > "blocked" reader or writer goes to wait_queue. as soon as writer finishes > (refcount becomes 0), it wakeups all existing processes in a wait_queue. > reader perform wakeup call only when it sees that pending writer is present > (active_writer is not NULL). > > cpu_hotplug lock now only required to protect refcount cmp, inc, dec > operations so it can be changed to spinlock. > Its best to avoid changing the core infrastructure in order to fix some call-site, unless that scenario is really impossible to handle with the current infrastructure. I have a couple of suggestions below, to solve this issue, without touching the core hotplug code: You can perhaps try cancelling the work item in two steps: a. using cancel_delayed_work() under CPU_DOWN_PREPARE b. using cancel_delayed_work_sync() under CPU_POST_DEAD And of course, destroy the resources associated with that work (like the timer_mutex) only after the full tear-down. Or perhaps you might find a way to perform the tear-down in just one step at the CPU_POST_DEAD stage. Whatever works correctly. The key point here is that the core CPU hotplug code provides us with the CPU_POST_DEAD stage, where the hotplug lock is _not_ held. Which is exactly what you want in solving the issue with cpufreq. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html