[Adding Boris and Thomas to the CC.] On Tuesday, March 19, 2013 02:20:06 PM Viresh Kumar wrote: > Hi Guys, > > We are talking here about a bug reported by Duncan here. His cpu/cpu*/cpufreq > directory are getting corrupted with 3.9-rc3 and was working well with 3.8 > > https://bugzilla.kernel.org/show_bug.cgi?id=55411 > > On his AMD bulldozer tri-cluster/6-core system he doesn't see affected > and related > cpus set correctly after off-lining 1-5 and bringing them back with: > > for i in 1 2 3 4 5; do echo 0 > /sys/devices/system/cpu/cpu$i/online ; done > for i in 1 2 3 4 5; do echo 1 > /sys/devices/system/cpu/cpu$i/online ; done > > Before running above two, cpufreq-info gave: > https://bugzilla.kernel.org/attachment.cgi?id=95701 > > And after running above it gave: > https://bugzilla.kernel.org/attachment.cgi?id=95711 > > Clearly it got corrupted. Somehow cpu 3 showed up in related cpus field of > cpu 5. > > I suspect following patches behind this: > > commit fcf8058296edbc3de43adf095824fc32b067b9f8 > Author: Viresh Kumar <viresh.kumar@xxxxxxxxxx> > Date: Tue Jan 29 14:39:08 2013 +0000 > > cpufreq: Simplify cpufreq_add_dev() > > Currently cpufreq_add_dev() firsts allocates policy, calls > driver->init() and then checks if this CPU is already managed or not. > And if it is already managed, its policy is freed. > > We can save all this if we somehow know that CPU is managed or not in > advance. policy->related_cpus contains the list of all valid sibling > CPUs of policy->cpu. We can check this to see if the current CPU is > already managed. > > From now on, platforms don't really need to set related_cpus from > their init() routines, as the same work is done by core too. > > If a platform driver needs to set the related_cpus mask with some > additional CPUs, other than CPUs present in policy->cpus, they are > free to do it, though, as we don't override anything. > > [rjw: Changelog] > Signed-off-by: Viresh Kumar <viresh.kumar@xxxxxxxxxx> > Tested-by: Shawn Guo <shawn.guo@xxxxxxxxxx> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > > AND > > commit 643ae6e81dd65b333a13259852405fc9f764ac76 > Author: Viresh Kumar <viresh.kumar@xxxxxxxxxx> > Date: Sat Jan 12 05:14:38 2013 +0000 > > cpufreq: Manage only online cpus > > cpufreq core doesn't manage offline cpus and if driver->init() has returned > mask including offline cpus, it may result in unwanted behavior by > cpufreq core > or governors. > > We need to get only online cpus in this mask. There are two places > to fix this > mask, cpufreq core and cpufreq driver. It makes sense to do this > at common place > and hence is done in core. > > Signed-off-by: Viresh Kumar <viresh.kumar@xxxxxxxxxx> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > > And this is the latest piece of documentation available: > > SMP systems normally have same clock source for a group of cpus. For these the > .init() would be called only once for the first online cpu. Here the .init() > routine must initialize policy->cpus with mask of all possible cpus (Online + > Offline) that share the clock. Then the core would copy this mask onto > policy->related_cpus and will reset policy->cpus to carry only online cpus. > > > I saw acpi-cpufreq drivers driver->init() code and found it is not yet > aligned to this > theory and probably that is causing these failures. > > I don't have enough knowledge about this driver and how is it used for all x86 > systems and so want somebody else (who has some prior experience with it) > to check how policy->cpus and policy->related_cpus must be set from > driver->init(). OK, so what exactly do you need to now? This has to be addressed before final 3.9 this way or another - and the sooner the better. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html