Re: [PATCH 1/2] acpi : cpu hot-remove returns error number when cpu_down() fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Srivatsa,

2012/07/09 20:25, Srivatsa S. Bhat wrote:
> On 07/09/2012 08:01 AM, Yasuaki Ishimatsu wrote:
>> Hi Srivatsa,
>>
>> Thank you for your reviewing.
>>
>> 2012/07/06 18:51, Srivatsa S. Bhat wrote:
>>> On 07/06/2012 08:46 AM, Yasuaki Ishimatsu wrote:
>>>> Even if cpu_down() fails, acpi_processor_remove() continues to remove the cpu.
>>>
>>> Ouch!
>>>
>>>> But in this case, it should return error number since some process may run on
>>>> the cpu. If the cpu has a running process and the cpu is turned the power off,
>>>> the system cannot work well.
>>>>
>>>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx>
>>>>
>>>> ---
>>>>    drivers/acpi/processor_driver.c |   18 ++++++++++++------
>>>>    1 file changed, 12 insertions(+), 6 deletions(-)
>>>>
>>>> Index: linux-3.5-rc4/drivers/acpi/processor_driver.c
>>>> ===================================================================
>>>> --- linux-3.5-rc4.orig/drivers/acpi/processor_driver.c	2012-06-25 04:53:04.000000000 +0900
>>>> +++ linux-3.5-rc4/drivers/acpi/processor_driver.c	2012-07-05 21:02:58.711285382 +0900
>>>> @@ -610,7 +610,7 @@ err_free_pr:
>>>>    static int acpi_processor_remove(struct acpi_device *device, int type)
>>>>    {
>>>>    	struct acpi_processor *pr = NULL;
>>>> -
>>>> +	int ret;
>>>>
>>>>    	if (!device || !acpi_driver_data(device))
>>>>    		return -EINVAL;
>>>> @@ -621,8 +621,9 @@ static int acpi_processor_remove(struct
>>>>    		goto free;
>>>>
>>>>    	if (type == ACPI_BUS_REMOVAL_EJECT) {
>>>> -		if (acpi_processor_handle_eject(pr))
>>>> -			return -EINVAL;
>>>> +		ret = acpi_processor_handle_eject(pr);
>>>> +		if (ret)
>>>> +			return ret;
>>>>    	}
>>>>
>>>>    	acpi_processor_power_exit(pr, device);
>>>> @@ -841,12 +842,17 @@ static acpi_status acpi_processor_hotadd
>>>>
>>>>    static int acpi_processor_handle_eject(struct acpi_processor *pr)
>>>>    {
>>>> -	if (cpu_online(pr->id))
>>>> -		cpu_down(pr->id);
>>>> +	int ret;
>>>> +
>>>> +	if (cpu_online(pr->id)) {
>>>> +		ret = cpu_down(pr->id);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +	}
>>>>
>>>
>>> Strictly speaking, this is not thorough enough. What prevents someone
>>> from onlining that same cpu again, at this point?
>>> So, IMHO, you need to wrap the contents of this function inside a
>>> get_online_cpus()/put_online_cpus() block, to prevent anyone else
>>> from messing with CPU hotplug at the same time.
>>
>> If I understand your comment by mistake, please let me know.
>> If the contents is wrapped a inside get_online_cpus()/put_online_cpus() block
>> as below, cpu_down() will stop since cpu_down() calls cpu_hotplug_begin() and
>> cpu_hotplug_begin() waits for cpu_hotplug.refcount to become 0.
>>
> 
> You are right. Sorry, I overlooked that.
> 
>> +	get_online_cpus()
>> +	if (cpu_online(pr->id)) {
>> +		ret = cpu_down(pr->id);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +	put_online_cpus()
>>
>> I think following patch can prevent it correctly. How about the patch?
>>
>> ---
>>   drivers/acpi/processor_driver.c |   12 ++++++++++++
>>   kernel/cpu.c                    |    8 +++++---
>>   2 files changed, 17 insertions(+), 3 deletions(-)
>>
>> Index: linux-3.5-rc4/drivers/acpi/processor_driver.c
>> ===================================================================
>> --- linux-3.5-rc4.orig/drivers/acpi/processor_driver.c	2012-07-09 09:59:01.280211202 +0900
>> +++ linux-3.5-rc4/drivers/acpi/processor_driver.c	2012-07-09 11:05:34.559859236 +0900
>> @@ -844,14 +844,26 @@ static int acpi_processor_handle_eject(s
>>   {
>>   	int ret;
>>
>> +retry:
>>   	if (cpu_online(pr->id)) {
>>   		ret = cpu_down(pr->id);
>>   		if (ret)
>>   			return ret;
>>   	}
>>
>> +	get_online_cpus();
>> +	/*
>> +	 * Someone might online the cpu again at this point. So we check that
>> +	 * cpu has been onlined or not. If cpu is online, we try to offline
>> +	 * the cpu again.
>> +	 */
>> +	if (cpu_online(pr->id)) {
> 
> How about this:
> 	if (unlikely(cpu_online(pr->id)) {
> since the probability of this happening is quite small...

Thanks. I'll update it.

>> +		put_online_cpus();
>> +		goto retry;
>> +	}
>>   	arch_unregister_cpu(pr->id);
>>   	acpi_unmap_lsapic(pr->id);
>> +	put_online_cpus();
>>   	return ret;
>>   }
> 
> This retry logic doesn't look elegant, but I don't see any better method :-(
> 
>>   #else
>> Index: linux-3.5-rc4/kernel/cpu.c
>> ===================================================================
>> --- linux-3.5-rc4.orig/kernel/cpu.c	2012-07-09 09:59:01.280211202 +0900
>> +++ linux-3.5-rc4/kernel/cpu.c	2012-07-09 09:59:02.903190965 +0900
>> @@ -343,11 +343,13 @@ static int __cpuinit _cpu_up(unsigned in
>>   	unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
>>   	struct task_struct *idle;
>>
>> -	if (cpu_online(cpu) || !cpu_present(cpu))
>> -		return -EINVAL;
>> -
>>   	cpu_hotplug_begin();
>>
>> +	if (cpu_online(cpu) || !cpu_present(cpu)) {
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>> +
> 
> Firstly, why is this change needed?

I cared the race of hot-remove cpu and _cpu_up(). If I do not change it,
there is the following race.

hot-remove cpu                         |  _cpu_up()
------------------------------------- ------------------------------------
call acpi_processor_handle_eject()     |
     call cpu_down()                   |
     call get_online_cpus()            |
                                       | call cpu_hotplug_begin() and stop here
     call arch_unregister_cpu()        |
     call acpi_unmap_lsapic()          |
     call put_online_cpus()            |
                                       | start and continue _cpu_up()
     return acpi_processor_remove()    |
continue hot-remove the cpu            |

So _cpu_up() can continue to itself. And hot-remove cpu can also continue
itself. If I change it, I think the race disappears as below:

hot-remove cpu                         | _cpu_up()
-----------------------------------------------------------------------
call acpi_processor_handle_eject()     |
     call cpu_down()                   |
     call get_online_cpus()            |
                                       | call cpu_hotplug_begin() and stop here
     call arch_unregister_cpu()        |
     call acpi_unmap_lsapic()          |
          cpu's cpu_present is set     |
	  to false by set_cpu_present()|
     call put_online_cpus()            |
                                       | start _cpu_up()
				       | check cpu_present() and return -EINVAL
     return acpi_processor_remove()    |
continue hot-remove the cpu            |

Thus I think the change is necessary.

Thanks,
Yasuaki Ishimatsu

> Secondly, if the change is indeed an improvement, then why is it
> in _this_ patch? IMHO, in that case it should be part of a separate patch.
> 
> Coming back to my first point, I don't see why this hunk is needed. We
> already take the cpu_add_remove_lock (cpu_maps_update_begin/end) before
> checking the status of the cpu (online or present). And all hotplug
> operations (cpu_up/cpu_down/disable|enable_nonboot_cpus) go through that
> lock. Isn't that enough? Or am I missing something?
> 
>>   	idle = idle_thread_get(cpu);
>>   	if (IS_ERR(idle)) {
>>   		ret = PTR_ERR(idle);
>>
>   
> Regards,
> Srivatsa S. Bhat
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux