On 08/01/2013 11:34 PM, Rafael J. Wysocki wrote: > On Thursday, August 01, 2013 08:54:59 PM Srivatsa S. Bhat wrote: >> On 08/01/2013 08:14 PM, Viresh Kumar wrote: >>> On 1 August 2013 13:41, Srivatsa S. Bhat >>> <srivatsa.bhat@xxxxxxxxxxxxxxxxxx> wrote: >>>> On 08/01/2013 05:38 AM, Rafael J. Wysocki wrote: >>>>> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> >>>>> >>>>> The cpufreq core is a little inconsistent in the way it uses the >>>>> driver module refcount. >>>>> >>>>> Namely, if __cpufreq_add_dev() is called for a CPU without siblings >>>>> or generally a CPU for which a new policy object has to be created, >>>>> it grabs a reference to the driver module to start with, but drops >>>>> that reference before returning. As a result, the driver module >>>>> refcount is then equal to 0 after __cpufreq_add_dev() has returned. >>>>> >>>>> On the other hand, if the given CPU is a sibling of some other >>>>> CPU already having a policy, cpufreq_add_policy_cpu() is called >>>>> to link the new CPU to the existing policy. In that case, >>>>> cpufreq_cpu_get() is called to obtain that policy and grabs a >>>>> reference to the driver module, but that reference is not >>>>> released and the module refcount will be different from 0 after >>>>> __cpufreq_add_dev() returns (unless there is an error). That >>>>> prevents the driver module from being unloaded until >>>>> __cpufreq_remove_dev() is called for all the CPUs that >>>>> cpufreq_add_policy_cpu() was called for previously. >>>>> >>>>> To remove that inconsistency make cpufreq_add_policy_cpu() execute >>>>> cpufreq_cpu_put() for the given policy before returning, which >>>>> decrements the driver module refcount so that it will be 0 after >>>>> __cpufreq_add_dev() returns, >>>> >>>> Removing the inconsistency is a good thing, but I think we should >>>> make it consistent the other way around - make a CPU-online increment >>>> the driver module refcount and decrement it only on CPU-offline. >>> >>> I took time to review to this mail as I was looking at the problem >>> yesterday. I am sorry to say, but I have completely different views as >>> compared to You and Rafael both :) >>> >>> First of all, Rafael's patch is incomplete as it hasn't fixed the issue >>> completely. When we have multiple CPUs per policy and >>> cpufreq_add_dev() is called for the first one, it call cpufreq_get_cpu() >>> for all cpus of this policy(), so count is == x (no. of CPUs in this policy) >>> + 1 (This is the initial value of .owner). >>> >>> And so we still have module count getting incremented for other cpus :) >>> >> >> Good catch! > > Sorry, I don't see how this happens. > > __cpufreq_add_dev() only directly calls cpufreq_cpu_get() at the beginning to > check if the policy is there and then immediately calls cpufreq_cpu_put() in > that case (for that policy). > > Next, cpufreq_add_policy_cpu() calls cpufreq_cpu_get(), but that's what my > patch changes. > > I don't see where else cpufreq_cpu_get() is called by __cpufreq_add_dev() > whether directly or indirectly. > __cpufreq_add_dev()->cpufreq_add_dev_interface()->cpufreq_add_dev_symlink(). The last one does: 815 for_each_cpu(j, policy->cpus) { 816 struct cpufreq_policy *managed_policy; 817 struct device *cpu_dev; 818 819 if (j == cpu) 820 continue; 821 822 pr_debug("CPU %u already managed, adding link\n", j); 823 managed_policy = cpufreq_cpu_get(cpu); 824 cpu_dev = get_cpu_device(j); 825 ret = sysfs_create_link(&cpu_dev->kobj, &policy->kobj, 826 "cpufreq"); ... } > Moreover, if I'm completely wrong and it is called there in an invisible > hush-hush way, it has to be explained why the module usage count as printed by > lsmod for acpi-cpufreq is 0 (in current linux-next). > Perhaps because none of your policies have more than one CPU associated with it? I think related_cpus should be able to tell us that.. But yes, it is a little hidden and moreover, we don't take the refcount if there is only one CPU in the mask. Which is a little inconsistent, IMHO. >>> Now few lines about My point of view to this whole thing. I believe we >>> should get rid of .owner field from struct cpufreq_driver completely. It >>> doesn't make sense to me in doing all this management at all. Surprised? >>> Shocked? Laughing at me? :) >>> >>> Well I may be wrong but this is what I think: >>> - It looks stupid to me that I can't do this from userspace in one go: >>> $ insmod cpufreq_driver.ko >>> $ rmmod cpufreq_driver.ko >>> >>> What the hell changed in between that isn't visible to user? It looked >>> completely stupid in that way.. >>> >>> Something like this sure makes sense: >>> $ insmod ondemand-governor.ko >>> $ change governor to ondemand for few CPUs >>> $ rmmod ondemand-governor.ko >>> >>> as we have deliberately add few users of governor. And so without second >>> step, rmmod should really work smoothly. And it does. >>> >>> Now, why shouldn't there be a problem with this approach? I will write >>> that inline to the problems you just described. >>> >>>> The reason is, one should not be able to unload the back-end cpufreq >>>> driver module when some CPUs are still being managed. Nasty things >>>> will result if we allow that. For example, if we unload the module, >>>> and then try to do a CPU offline, then the cpufreq hotplug notifier >>>> won't even be called (because cpufreq_unregister_driver also >>>> unregisters the hotplug notifier). And that might be troublesome. >>> >>> So what? Its simply equivalent to we have booted our system, haven't >>> inserted cpufreq module and taken out a cpu. > > I'd put that differently. > > With the current code as is it may cause problems to happen, but there are > two ways to change that in general: > > (1) Disallow the removal of the cpufreq driver while there are any users, but > for that we only need the driver to be refcounted *once* when a new policy > is created (and not as many times as there are CPUs using that policy). > Then, the reference can be dropped while removing the policy object. > > (2) Allow the removal of the cpufreq driver, but harden the code against > that. [Maybe it doesn't have to be hardened any more as is, I haven't > checked that.] > > I agree with Viresh that (1) is kind of weird from the usability perspective, > because if we did that, it wouldn't be practically possible to remove cpufreq > driver modules after loading them (cpuidle currently has that problem). > Yes, I think we can go with Viresh's approach and use plain locking to synchronize things. Returning -EBUSY isn't really beneficial, since the critical sections are small and finite - its not like the user has to wait a long time to rmmod the module if we use locking. >>>> Even worse, if we unload a cpufreq driver module and load a new >>>> one and *then* try to offline the CPU, then the cpufreq_driver->exit() >>>> function that we call during CPU offline will end up calling the >>>> corresponding function of an entirely different driver! So the >>>> ->init() and ->exit() calls won't match. >>> >>> That's not true. When we unload the module, it must call >>> cpufreq_unregister_driver() which should call cpufreq_remove_cpu() >>> for all cpus and so exit() is already called for last module. >>> >> >> Sorry, I missed this one. >> >>> If we get something new now, it should simply work. >>> >> >> Yeah, I now see your point. It won't create any problems by >> unloading the module and loading a new one. >> >>> What do you think gentlemen? >>> >> >> Well, I now agree that we don't have to keep the module refcount >> non-zero as long as CPUs are being managed (that was just my >> misunderstanding, sorry for the noise). However, I think the _get() >> and _put() used in the existing code is for synchronization: that >> is, to avoid races between trying to unload the cpufreq driver >> module and running a hotplug notifier (and calling the driver module's >> ->init() or ->exit() function). >> >> With that being the case, I think we can retain the module refcounts >> and use them only for synchronization. And naturally the refcount >> should drop to zero after the critical section; no point keeping >> it incremented until the CPU is taken offline. >> >> Or, am I confused again? > > No, I don't think so. > > In fact, the point of my patch was to make the module refcount stay 0 > beyond critical sections, but it looks like I overlooked something. What is > that? > Its the cpufreq_cpu_get() hidden away in cpufreq_add_dev_symlink(). With that taken care of, everything should be OK. Then we can change the synchronization part to avoid using refcounts. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html