Re: stable 3-10-3: strange output of "lsmod | grep ^acpi_cpufreq"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday, July 29, 2013 08:57:20 PM Srivatsa S. Bhat wrote:
> On 07/29/2013 05:34 PM, Rafael J. Wysocki wrote:
> > On Monday, July 29, 2013 05:14:25 PM Srivatsa S. Bhat wrote:
> >> On 07/29/2013 05:19 PM, Rafael J. Wysocki wrote:
> >>> On Monday, July 29, 2013 03:11:45 PM Srivatsa S. Bhat wrote:
> >>>> On 07/29/2013 04:50 AM, Rafael J. Wysocki wrote:
> >>>>> On Monday, July 29, 2013 12:43:59 AM Rafael J. Wysocki wrote:
> >>>>>> On Monday, July 29, 2013 12:11:18 AM Rafael J. Wysocki wrote:
> >>>>>>> On Sunday, July 28, 2013 12:21:22 PM Toralf Förster wrote:
> >>>>>>>> On 07/28/2013 01:39 AM, Rafael J. Wysocki wrote:
> >>>>>>>>> On Saturday, July 27, 2013 07:40:34 PM Toralf Förster wrote:
> >>>>>>>>>> it gives at a ThinkPad T420:
> >>>>>>>>>>
> >>>>>>>>>> tfoerste@n22 ~/tmp $ lsmod | grep ^acpi_cpufreq
> >>>>>>>>>> acpi_cpufreq           12902  2147483647
> >>>>>>>>>
> >>>>>>>>> That is -1, which indicates some module refcount woes.
> >>>>>>>>
> >>>>>>>> yes, ofc.
> >>>>>>>>
> >>>>>>>> The issue apears after 1 s2ram/resume cycle, before s2ram the refcount is 1.
> >>>>>>>>
> >>>>>>>>> I definitely can't see that with the mainline on my machines.
> >>>>>>>>
> >>>>>>>> It is in mainline too.
> >>>>>>>
> >>>>>>> Does the appended patch help?
> >>>>>>
> >>>>>> Actually, something as simple as this also should help:
> >>>>>>
> >>>>>> ---
> >>>>>> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >>>>>> Subject: cpufreq: Fix cpufreq driver module refcount balance after suspend/resume
> >>>>>>
> >>>>>> Since cpufreq_cpu_put() called by __cpufreq_remove_dev() drops the
> >>>>>> driver module refcount, __cpufreq_remove_dev() causes that refcount
> >>>>>> to become negative after a suspend/resume cycle, for example.
> >>>>>>
> >>>>>> To prevent this from happening make __cpufreq_remove_dev() put
> >>>>>> the policy kobject only instead of calling cpufreq_cpu_put().
> >>>>>
> >>>>> Having a deeper look at it, though, I see that in fact the whole
> >>>>> cpufreq_cpu_put() is needed if the CPU is not the last one for the given
> >>>>> policy and is not needed at all otherwise (as described in the changelog
> >>>>> of the patch below).
> >>>>>
> >>>>> Srivatsa, does this make sense to you?
> >>>>>
> >>>>
> >>>> Code-wise, your patch looks good to me. But one thing in the existing code
> >>>> struck me as a little strange.
> >>>>
> >>>> I'm assuming that the module_get() is used in the cpufreq core to ensure that
> >>>> until the cpufreq core is managing (atleast one) CPU(s), the cpufreq backend
> >>>> driver module (eg: acpi-cpufreq) can't be removed.
> >>>
> >>> Quite frankly, I'm not sure about that.  If that were the case,
> >>> cpufreq_add_dev() would not call module_put() which it does.
> >>>
> >>> That may be a bug, I agree, but that's not for the present release cycle.  For
> >>> now, we need to ensure that the reference counts are *balanced* .
> >>>
> >>
> >> Sure, in that case, I agree that your patch is the right fix at this point,
> >> since it resolves the immediate problem that we have with the refcounts.
> >>
> >> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
> > 
> > Great, thanks!
> > 
> > Does your patchset avoiding the creation/removal of sysfs directories over
> > suspend/resume need to be modified to take this patch into account?
> > 
> 
> There will be some trivial conflicts, but looking a bit closer at my patchset,
> I believe it has a subtle refcounting bug, relating to patches 6 and 7.
> 
> Patch 6: http://marc.info/?l=linux-kernel&m=137358141628042&w=2
> Patch 7: http://marc.info/?l=linux-pm&m=137358124527993&w=2
> 
> 
> Patch 6 modifies cpufreq_add_policy_cpu() to return 0 if frozen == true.
> But by that time, it would have already done a cpufreq_cpu_get().
> 
> In patch 7, under __cpufreq_remove_dev(), we avoid dropping refcounts
> for *any* CPU when frozen == true.
> 
> In the general case, this is fine (like myself and Viresh had discussed
> earlier in that thread), because __cpufreq_add_dev() doesn't increment the
> refcount in the frozen case.
> 
> But we overlooked a rather subtle scenario: say, CPU 'X' went through
> cpufreq_add_dev in the resume path, and restored its policy structure.
> Since this was fresh, no refcount was taken. And then CPU 'Y' came along,
> but it was related to CPU 'X', so it will take this path:
> 
> 1018         /* Check if this cpu was hot-unplugged earlier and has siblings */
> 1019         read_lock_irqsave(&cpufreq_driver_lock, flags);
> 1020         for_each_online_cpu(sibling) {
> 1021                 struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling);
> 1022                 if (cp && cpumask_test_cpu(cpu, cp->related_cpus)) {
> 1023                         read_unlock_irqrestore(&cpufreq_driver_lock, flags);
> 1024                         return cpufreq_add_policy_cpu(cpu, sibling, dev,
> 1025                                                       frozen);
> 1026                 }
> 1027         }
> 
> This time, cp won't be NULL for CPU 'X', because its policy was restored
> in the previous call. So we end up calling cpufreq_add_policy_cpu() for
> CPU 'Y'. And that function takes a new refcount, as mentioned above.
> 
> So we need to have this extra hunk in cpufreq_add_policy_cpu() to ensure
> we don't take any additional refcounts during suspend/resume:
> 
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index c7f59e8..9681c01 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -932,8 +932,11 @@ static int cpufreq_add_policy_cpu(unsigned int cpu, unsigned int sibling,
>  	}
>  
>  	/* Don't touch sysfs links during light-weight init */
> -	if (frozen)
> +	if (frozen) {
> +		/* Drop the extra refcount that we took above */
> +		cpufreq_cpu_put(policy);
>  		return 0;
> +	}
>  
>  	ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
>  	if (ret)
> 
> 
> I can respin my entire patchset with this fix included, on top of your
> patch. Please let me know if you'd prefer some other method for resolving
> this.

Please respin the patchset with the fix folded in.

My patch is in the bleeding-edge branch of linux-pm.git, so please rebase the
new version on that if that's not a problem.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Devel]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Forum]     [Linux SCSI]

  Powered by Linux