Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/05/2012 08:54 AM, Yasuaki Ishimatsu wrote:
> 2012/10/04 15:16, Srivatsa S. Bhat wrote:
>> On 10/04/2012 02:43 AM, Andrew Morton wrote:
>>> On Wed, 03 Oct 2012 18:23:09 +0530
>>> "Srivatsa S. Bhat" <srivatsa.bhat@xxxxxxxxxxxxxxxxxx> wrote:
>>>
>>>> The synchronization between CPU hotplug readers and writers is
>>>> achieved by
>>>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>>>
>>>> get_online_cpus() increments the refcount, whereas put_online_cpus()
>>>> decrements
>>>> it. If we ever hit an imbalance between the two, we end up
>>>> compromising the
>>>> guarantees of the hotplug synchronization i.e, for example, an extra
>>>> call to
>>>> put_online_cpus() can end up allowing a hotplug reader to execute
>>>> concurrently with
>>>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect
>>>> such cases
>>>> where the refcount can go negative.
>>>>
>>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
>>>> ---
>>>>
>>>>   kernel/cpu.c |    1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>>>> index f560598..00d29bc 100644
>>>> --- a/kernel/cpu.c
>>>> +++ b/kernel/cpu.c
>>>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>>>>       if (cpu_hotplug.active_writer == current)
>>>>           return;
>>>>       mutex_lock(&cpu_hotplug.lock);
>>>> +    BUG_ON(cpu_hotplug.refcount == 0);
>>>>       if (!--cpu_hotplug.refcount &&
>>>> unlikely(cpu_hotplug.active_writer))
>>>>           wake_up_process(cpu_hotplug.active_writer);
>>>>       mutex_unlock(&cpu_hotplug.lock);
>>>
>>> I think calling BUG() here is a bit harsh.  We should only do that if
>>> there's a risk to proceeding: a risk of data loss, a reduced ability to
>>> analyse the underlying bug, etc.
>>>
>>> But a cpu-hotplug locking imbalance is a really really really minor
>>> problem!  So how about we emit a warning then try to fix things up?
>>
>> That would be better indeed, thanks!
>>
>>> This should increase the chance that the machine will keep running and
>>> so will increase the chance that a user will be able to report the bug
>>> to us.
>>>
>>
>> Yep, sounds good.
>>
>>>
>>> ---
>>> a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
>>>
>>> +++ a/kernel/cpu.c
>>> @@ -80,9 +80,12 @@ void put_online_cpus(void)
>>>       if (cpu_hotplug.active_writer == current)
>>>           return;
>>>       mutex_lock(&cpu_hotplug.lock);
>>> -    BUG_ON(cpu_hotplug.refcount == 0);
>>> -    if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>> -        wake_up_process(cpu_hotplug.active_writer);
>>> +    if (!--cpu_hotplug.refcount) {
>>
>> This won't catch it. We'll enter this 'if' condition only when
>> cpu_hotplug.refcount was
>> decremented to zero. We'll miss out the case when it went negative
>> (which we intended to detect).
>>
>>> +        if (WARN_ON(cpu_hotplug.refcount == -1))
>>> +            cpu_hotplug.refcount++;    /* try to fix things up */
>>> +        if (unlikely(cpu_hotplug.active_writer))
>>> +            wake_up_process(cpu_hotplug.active_writer);
>>> +    }
>>>       mutex_unlock(&cpu_hotplug.lock);
>>>
>>>   }
>>
>> So how about something like below:
>>
>> ------------------------------------------------------>
>>
>> From: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
>> Subject: [PATCH] CPU hotplug, debug: Detect imbalance between
>> get_online_cpus() and put_online_cpus()
>>
>> The synchronization between CPU hotplug readers and writers is
>> achieved by
>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>
>> get_online_cpus() increments the refcount, whereas put_online_cpus()
>> decrements
>> it. If we ever hit an imbalance between the two, we end up
>> compromising the
>> guarantees of the hotplug synchronization i.e, for example, an extra
>> call to
>> put_online_cpus() can end up allowing a hotplug reader to execute
>> concurrently with
>> a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect
>> such cases
>> where the refcount can go negative, and also attempt to fix it up, so
>> that we can
>> continue to run.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
>> ---
> 
> Looks good to me.
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx>
> 

Thanks for your review Yasuaki!

Regards,
Srivatsa S. Bhat

>>
>>   kernel/cpu.c |    4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index f560598..42bd331 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -80,6 +80,10 @@ void put_online_cpus(void)
>>       if (cpu_hotplug.active_writer == current)
>>           return;
>>       mutex_lock(&cpu_hotplug.lock);
>> +
>> +    if (WARN_ON(!cpu_hotplug.refcount))
>> +        cpu_hotplug.refcount++; /* try to fix things up */
>> +
>>       if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>           wake_up_process(cpu_hotplug.active_writer);
>>       mutex_unlock(&cpu_hotplug.lock);
>>
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]