On 09/11/2013 11:33 PM, Srivatsa S. Bhat wrote: > On 09/11/2013 09:35 PM, Stephen Warren wrote: >> On 09/11/2013 04:21 AM, Srivatsa S. Bhat wrote: >>> On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote: >>>> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote: >>>>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote: >>>>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote: >>>>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote: >>>>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote: >>>>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote: >>>>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote: >>>>>>>>>>> Viresh, >>>>>>>>>>> >>>>>>>>>>> I'm seeing the crash below when suspending my system for the second time. >> ... >>> Stephen, I went through the code and I think I found out what is going wrong. >>> Can you please try the following patch? >> >> Unfortunately, I still see the exact same failure/backtrace with this >> patch applied. >> > > Oh, is it? Can you please give me the map of the related cpus on your > system? (ie., cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus for > each CPU.) > > I must be missing something... > OK, I took a second look at the code, and I suspect that applying the second patch might help. So can you try by applying both the patches please[1][2]? Basically here is my hunch: say CPUs 2 and 3 are part of a policy and 3 is the policy->cpu. During suspend, CPU 2 will be taken offline first, and we hit this code: 1199 if (cpu != policy->cpu && !frozen) { 1200 sysfs_remove_link(&dev->kobj, "cpufreq"); 1201 } else if (cpus > 1) { 1202 1203 new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen); 1204 if (new_cpu >= 0) { 1205 WARN_ON(lock_policy_rwsem_write(cpu)); 1206 update_policy_cpu(policy, new_cpu); 1207 unlock_policy_rwsem_write(cpu); 1208 1209 if (!frozen) { 1210 pr_debug("%s: policy Kobject moved to cpu: %d " 1211 "from: %d\n",__func__, new_cpu, cpu); 1212 } 1213 } 1214 } At this point, the first 'if' condition fails because frozen == true. So it enters the else part. But, policy->cpu is actually 3, not 2, and hence we invoke nominate_...() unnecessarily. That function returns 3 since that's the only CPU remaining in the mask, and so we call update_policy_cpu() with new_cpu = 3, and old_cpu was also 3! And that is the perfect recipe for disaster, with the current implementation of update_policy_cpu(). And my second patch [2] tried to fix this exact problem, although I didn't realize we actually had a case where we hit this in the current code itself. So please try by applying both the patches and let me know how it goes. Thanks a lot for your testing efforts! [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2 [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2 Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html