please use the right list for cpufreq stuff. Also, cc maintainers if you want quick replies to your mail. On Wed, Nov 19, 2014 at 11:25 AM, Arun KS <arunks.linux@xxxxxxxxx> wrote: > Hello, > > Seen a race condition in cpufrequency driver. > > cpu2 is being hot-plugged out. And this started at say, 20th msec. > -000|context_switch(inline) > -000|need_resched() > -001|preempt_schedule(inline) > -001|preempt_schedule() > -002|static_key_false(inline) > -002|trace_sched_cpu_hotplug(inline) > -002|cpu_down(cpu = 2, ?) > -003|cpu_down(cpu = 2) > -004|update_offline_cores(?) > -005|do_hotplug(?) > -006|kthread(_create = 0xEE85BEBC) > > cpu1 is updating the governor at say 60th msec. > echo "some_governor" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > at 60th msec, cpu2 is already hot-plugged but CPU_POST_DEAD has not > called because __cpu_down was scheduled out at cpu_hotplug_done(while > unlocking mutex) > > now store_scaling_governor calls cpufreq_set_policy > CPUFREQ_GOV_START iterates through all cpus in policy->cpus(which is > not the correct one now, because cpu2 is already hot-plugged out(DEAD) > but not updated in policy->cpus). > > One suggestion is to use CPU_DEAD instead of CPU_POST_DEAD in cpufreq.c > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 644b54e..5fdaf06 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -2325,7 +2325,7 @@ static int cpufreq_cpu_callback(struct > notifier_block *nfb, > __cpufreq_remove_dev_prepare(dev, NULL); > break; > > - case CPU_POST_DEAD: > + case CPU_DEAD: > __cpufreq_remove_dev_finish(dev, NULL); > break; > > Or add a mutex to serialize the context. > Appreciate your valuable comments. What kernel version are you using? Sometime back this patch addressed this problem: commit 4f750c930822b92df74327a4d1364eff87701360 Author: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx> Date: Sat Sep 7 01:23:43 2013 +0530 cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug The functions that are used to write to cpufreq sysfs files (such as store_scaling_max_freq()) are not hotplug safe. They can race with CPU hotplug tasks and lead to problems such as trying to acquire an already destroyed timer-mutex etc. Eg: __cpufreq_remove_dev() __cpufreq_governor(policy, CPUFREQ_GOV_STOP); policy->governor->governor(policy, CPUFREQ_GOV_STOP); cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: mutex_destroy(&cpu_cdbs->timer_mutex) cpu_cdbs->cur_policy = NULL; <PREEMPT> store() __cpufreq_set_policy() __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); policy->governor->governor(policy, CPUFREQ_GOV_LIMITS); case CPUFREQ_GOV_LIMITS: mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex) if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL So use get_online_cpus()/put_online_cpus() in the store_*() functions, to synchronize with CPU hotplug. However, there is an additional point to note here: some parts of the CPU teardown in the cpufreq subsystem are done in the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the get/put_online_cpus() functions alone is insufficient; we should also ensure that we don't race with those latter steps in the hotplug sequence. We can easily achieve this by checking if the CPU is online before proceeding with the store, since the CPU would have been marked offline by the time the CPU_POST_DEAD notifiers are executed. -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html