Re: [RFC][PATCH] cpufreq: Do not hold driver module references for additional policy CPUs

Viresh Kumar <viresh.kumar@xxxxxxxxxx> · Thu, 1 Aug 2013 20:14:31 +0530

On 1 August 2013 13:41, Srivatsa S. Bhat
<srivatsa.bhat@xxxxxxxxxxxxxxxxxx> wrote:
> On 08/01/2013 05:38 AM, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>>
>> The cpufreq core is a little inconsistent in the way it uses the
>> driver module refcount.
>>
>> Namely, if __cpufreq_add_dev() is called for a CPU without siblings
>> or generally a CPU for which a new policy object has to be created,
>> it grabs a reference to the driver module to start with, but drops
>> that reference before returning.  As a result, the driver module
>> refcount is then equal to 0 after __cpufreq_add_dev() has returned.
>>
>> On the other hand, if the given CPU is a sibling of some other
>> CPU already having a policy, cpufreq_add_policy_cpu() is called
>> to link the new CPU to the existing policy.  In that case,
>> cpufreq_cpu_get() is called to obtain that policy and grabs a
>> reference to the driver module, but that reference is not
>> released and the module refcount will be different from 0 after
>> __cpufreq_add_dev() returns (unless there is an error).  That
>> prevents the driver module from being unloaded until
>> __cpufreq_remove_dev() is called for all the CPUs that
>> cpufreq_add_policy_cpu() was called for previously.
>>
>> To remove that inconsistency make cpufreq_add_policy_cpu() execute
>> cpufreq_cpu_put() for the given policy before returning, which
>> decrements the driver module refcount so that it will be 0 after
>> __cpufreq_add_dev() returns,
>
> Removing the inconsistency is a good thing, but I think we should
> make it consistent the other way around - make a CPU-online increment
> the driver module refcount and decrement it only on CPU-offline.

I took time to review to this mail as I was looking at the problem
yesterday. I am sorry to say, but I have completely different views as
compared to You and Rafael both :)

First of all, Rafael's patch is incomplete as it hasn't fixed the issue
completely. When we have multiple CPUs per policy and
cpufreq_add_dev() is called for the first one, it call cpufreq_get_cpu()
for all cpus of this policy(), so count is == x (no. of CPUs in this policy)
+ 1 (This is the initial value of .owner).

And so we still have module count getting incremented for other cpus :)

Now few lines about My point of view to this whole thing. I believe we
should get rid of .owner field from struct cpufreq_driver completely. It
doesn't make sense to me in doing all this management at all. Surprised?
Shocked? Laughing at me? :)

Well I may be wrong but this is what I think:
- It looks stupid to me that I can't do this from userspace in one go:
  $ insmod cpufreq_driver.ko
  $ rmmod cpufreq_driver.ko

What the hell changed in between that isn't visible to user? It looked
completely stupid in that way..

Something like this sure makes sense:
$ insmod ondemand-governor.ko
$ change governor to ondemand for few CPUs
$ rmmod ondemand-governor.ko

as we have deliberately add few users of governor. And so without second
step, rmmod should really work smoothly. And it does.

Now, why shouldn't there be a problem with this approach? I will write
that inline to the problems you just described.

> The reason is, one should not be able to unload the back-end cpufreq
> driver module when some CPUs are still being managed. Nasty things
> will result if we allow that. For example, if we unload the module,
> and then try to do a CPU offline, then the cpufreq hotplug notifier
> won't even be called (because cpufreq_unregister_driver also
> unregisters the hotplug notifier). And that might be troublesome.

So what? Its simply equivalent to we have booted our system, haven't
inserted cpufreq module and taken out a cpu.

> Even worse, if we unload a cpufreq driver module and load a new
> one and *then* try to offline the CPU, then the cpufreq_driver->exit()
> function that we call during CPU offline will end up calling the
> corresponding function of an entirely different driver! So the
> ->init() and ->exit() calls won't match.

That's not true. When we unload the module, it must call
cpufreq_unregister_driver() which should call cpufreq_remove_cpu()
for all cpus and so exit() is already called for last module.

If we get something new now, it should simply work.

What do you think gentlemen?

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html