On Tue, 2012-07-10 at 13:27 +0530, Srivatsa S. Bhat wrote: > On 07/10/2012 10:27 AM, Yasuaki Ishimatsu wrote: > > Hi Toshi, > > > > 2012/07/10 6:15, Toshi Kani wrote: > >> On Mon, 2012-07-09 at 16:55 +0530, Srivatsa S. Bhat wrote: > >>> On 07/09/2012 08:01 AM, Yasuaki Ishimatsu wrote: > >>>> Hi Srivatsa, > >>>> > >>>> Thank you for your reviewing. > >>>> > >>>> 2012/07/06 18:51, Srivatsa S. Bhat wrote: > >>>>> On 07/06/2012 08:46 AM, Yasuaki Ishimatsu wrote: > >>>>>> Even if cpu_down() fails, acpi_processor_remove() continues to > >>>>>> remove the cpu. > >>>>> > >>>>> Ouch! > >>>>> > >>>>>> But in this case, it should return error number since some process > >>>>>> may run on > >>>>>> the cpu. If the cpu has a running process and the cpu is turned > >>>>>> the power off, > >>>>>> the system cannot work well. > >>>>>> > >>>>>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx> > >>>>>> > >>>>>> --- > >>>>>> drivers/acpi/processor_driver.c | 18 ++++++++++++------ > >>>>>> 1 file changed, 12 insertions(+), 6 deletions(-) > >>>>>> > >>>>>> Index: linux-3.5-rc4/drivers/acpi/processor_driver.c > >>>>>> =================================================================== > >>>>>> --- linux-3.5-rc4.orig/drivers/acpi/processor_driver.c > >>>>>> 2012-06-25 04:53:04.000000000 +0900 > >>>>>> +++ linux-3.5-rc4/drivers/acpi/processor_driver.c 2012-07-05 > >>>>>> 21:02:58.711285382 +0900 > >>>>>> @@ -610,7 +610,7 @@ err_free_pr: > >>>>>> static int acpi_processor_remove(struct acpi_device *device, > >>>>>> int type) > >>>>>> { > >>>>>> struct acpi_processor *pr = NULL; > >>>>>> - > >>>>>> + int ret; > >>>>>> > >>>>>> if (!device || !acpi_driver_data(device)) > >>>>>> return -EINVAL; > >>>>>> @@ -621,8 +621,9 @@ static int acpi_processor_remove(struct > >>>>>> goto free; > >>>>>> > >>>>>> if (type == ACPI_BUS_REMOVAL_EJECT) { > >>>>>> - if (acpi_processor_handle_eject(pr)) > >>>>>> - return -EINVAL; > >>>>>> + ret = acpi_processor_handle_eject(pr); > >>>>>> + if (ret) > >>>>>> + return ret; > >>>>>> } > >>>>>> > >>>>>> acpi_processor_power_exit(pr, device); > >>>>>> @@ -841,12 +842,17 @@ static acpi_status acpi_processor_hotadd > >>>>>> > >>>>>> static int acpi_processor_handle_eject(struct acpi_processor *pr) > >>>>>> { > >>>>>> - if (cpu_online(pr->id)) > >>>>>> - cpu_down(pr->id); > >>>>>> + int ret; > >>>>>> + > >>>>>> + if (cpu_online(pr->id)) { > >>>>>> + ret = cpu_down(pr->id); > >>>>>> + if (ret) > >>>>>> + return ret; > >>>>>> + } > >>>>>> > >>>>> > >>>>> Strictly speaking, this is not thorough enough. What prevents someone > >>>>> from onlining that same cpu again, at this point? > >>>>> So, IMHO, you need to wrap the contents of this function inside a > >>>>> get_online_cpus()/put_online_cpus() block, to prevent anyone else > >>>>> from messing with CPU hotplug at the same time. > >>>> > >>>> If I understand your comment by mistake, please let me know. > >>>> If the contents is wrapped a inside > >>>> get_online_cpus()/put_online_cpus() block > >>>> as below, cpu_down() will stop since cpu_down() calls > >>>> cpu_hotplug_begin() and > >>>> cpu_hotplug_begin() waits for cpu_hotplug.refcount to become 0. > >>>> > >>> > >>> You are right. Sorry, I overlooked that. > >>> > >>>> + get_online_cpus() > >>>> + if (cpu_online(pr->id)) { > >>>> + ret = cpu_down(pr->id); > >>>> + if (ret) > >>>> + return ret; > >>>> + } > >>>> + put_online_cpus() > >>>> > >>>> I think following patch can prevent it correctly. How about the patch? > >>>> > >>>> --- > >>>> drivers/acpi/processor_driver.c | 12 ++++++++++++ > >>>> kernel/cpu.c | 8 +++++--- > >>>> 2 files changed, 17 insertions(+), 3 deletions(-) > >>>> > >>>> Index: linux-3.5-rc4/drivers/acpi/processor_driver.c > >>>> =================================================================== > >>>> --- linux-3.5-rc4.orig/drivers/acpi/processor_driver.c 2012-07-09 > >>>> 09:59:01.280211202 +0900 > >>>> +++ linux-3.5-rc4/drivers/acpi/processor_driver.c 2012-07-09 > >>>> 11:05:34.559859236 +0900 > >>>> @@ -844,14 +844,26 @@ static int acpi_processor_handle_eject(s > >>>> { > >>>> int ret; > >>>> > >>>> +retry: > >>>> if (cpu_online(pr->id)) { > >>>> ret = cpu_down(pr->id); > >>>> if (ret) > >>>> return ret; > >>>> } > >>>> > >>>> + get_online_cpus(); > >>>> + /* > >>>> + * Someone might online the cpu again at this point. So we > >>>> check that > >>>> + * cpu has been onlined or not. If cpu is online, we try to > >>>> offline > >>>> + * the cpu again. > >>>> + */ > >>>> + if (cpu_online(pr->id)) { > >>> > >>> How about this: > >>> if (unlikely(cpu_online(pr->id)) { > >>> since the probability of this happening is quite small... > >>> > >>>> + put_online_cpus(); > >>>> + goto retry; > >>>> + } > >>>> arch_unregister_cpu(pr->id); > >>>> acpi_unmap_lsapic(pr->id); > >>>> + put_online_cpus(); > >>>> return ret; > >>>> } > >>> > >>> This retry logic doesn't look elegant, but I don't see any better > >>> method :-( > >> > >> Another possible option is to fail the request instead of retrying it. > > > > Good idea!! I'll update it. Sounds good. Thanks Yasuaki for updating it. > >> It would be quite challenging to allow on-lining and off-lining > >> operations to run concurrently. In fact, even if we close this window, > >> there is yet another window right after the new put_online_cpus() call. > > > > I think if we close the window, another window does not open. > > acpi_unmap_lsapic() sets cpu_present mask to false before new > > put_online_cpus() > > is called. So even if _cpu_up() is called, the function returns -EINAVL by > > following added code. > > > > @@ -343,11 +343,13 @@ static int __cpuinit _cpu_up(unsigned in > > unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0; > > struct task_struct *idle; > > > > - if (cpu_online(cpu) || !cpu_present(cpu)) > > - return -EINVAL; > > - > > cpu_hotplug_begin(); > > > > + if (cpu_online(cpu) || !cpu_present(cpu)) { > > + ret = -EINVAL; > > + goto out; > > + } > > + > > > > Right. Yasuaki's patch will ensure that there are no more race conditions > because it does the cpu_present() check after taking the cpu_hotplug.lock. > So I think it is safe and still doable from the kernel's perspective. > > But the question is, "should we do it?" I think Toshi's suggestion of failing > the hot-remove request (if we find that the cpu has been onlined again by some > other task) sounds like a good idea for another reason: cpu hotplug is not > initiated by the OS by itself; its requested by the user; so if something is > onlining the cpu back again, the user better take a second look and decide > what exactly he wants to do with that cpu - whether keep it online or > hot-remove it out. > > Trying to online as well as hot-remove the same cpu simultaneously sounds like > a crazy thing to do, and returning -EBUSY or -EAGAIN in the hot-remove case > (ie., failing that request) would give a warning to the user and a chance to > reflect upon what exactly he wants to do with the cpu. > > So, IMHO, we should protect against the race condition (between cpu_up and > hot-remove) but choose to fail the hot-remove request, and add a comment saying > why we chose to fail the request, even though we could have gone ahead and > completed it. Agreed. Thanks for the nice summary, Srivatsa! Thanks, -Toshi > Regards, > Srivatsa S. Bhat > -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html