Hi, The following introduction is still valid for patches [1-3/4] and patch [4/4] reworks the ACPI processor driver to use the new code. Some details below. On Monday, April 29, 2013 02:23:59 PM Rafael J. Wysocki wrote: > > It has been argued for a number of times that in some cases, if a device cannot > be gracefully removed from the system, it shouldn't be removed from it at all, > because that may lead to a kernel crash. In particular, that will happen if a > memory module holding kernel memory is removed, but also removing the last CPU > in the system may not be a good idea. [And I can imagine a few other cases > like that.] > > The kernel currently only supports "forced" hot-remove which cannot be stopped > once started, so users have no choice but to try to hot-remove stuff and see > whether or not that crashes the kernel which is kind of unpleasant. That seems > to be based on the "the user knows better" argument according to which users > triggering device hot-removal should really know what they are doing, so the > kernel doesn't have to worry about that. However, for instance, this pretty > much isn't the case for memory modules, because the users have no way to see > whether or not any kernel memory has been allocated from a given module. > > There have been a few attempts to address this issue, but none of them has > gained broader acceptance. The following 3 patches are the heart of a new > proposal which is based on the idea to introduce device_offline() and > device_online() operations along the lines of the existing CPU offline/online > mechanism (or, rather, to extend the CPU offline/online so that analogous > operations are available for other devices). The way it is supposed to work is > that device_offline() will fail if the given device cannot be gracefully > removed from the system (in the kernel's view). Once it succeeds, though, the > device won't be used any more until either it is removed, or device_online() is > run for it. That will allow the ACPI device hot-remove code, for one example, > to avoid triggering a non-reversible removal procedure for devices that cannot > be removed gracefully. > > Patch [1/3] introduces device_offline() and device_online() as outlined above. > The .offline() and .online() callbacks are only added at the bus type level for > now, because that should be sufficient to cover the memory and CPU use cases. That's [1/4] now and the changes from the previous version are: - strtobool() is used in store_online(). - device_offline_lock has been renamed to device_hotplug_lock (and the functions operating it accordingly) following the Toshi's advice. > Patch [2/3] modifies the CPU hotplug support code to use device_offline() and > device_online() to support the sysfs 'online' attribute for CPUs. That is [2/4] now and it takes cpu_hotplug_driver_lock() around cpu_up() and cpu_down(). > Patch [3/3] changes the ACPI device hot-remove code to use device_offline() > for checking if graceful removal of devices is possible. The way it does that > is to walk the list of "physical" companion devices for each struct acpi_device > involved in the operation and call device_offline() for each of them. If any > of the device_offline() calls fails (and the hot-removal is not "forced", which > is an option), the removal procedure (which is not reversible) is simply not > carried out. That's current [3/4]. It's a bit simpler, because I decided that it would be better to have a global 'force_remove' attribute (the semantics of the per-profile 'force_remove' wasn't clear and it didn't really add any value over a global one). I also added lock/unlock_device_hotplug() around acpi_bus_scan() in acpi_scan_bus_device_check() to allow scan handlers to update dev->offline for "physical" companion devices safely (the processor's one added by the next patch actually does that). > Of some concern is that device_offline() (and possibly device_online()) is > called under physical_node_lock of the corresponding struct acpi_device, which > introduces ordering dependency between that lock and device locks for the > "physical" devices, but I didn't see any cleaner way to do that (I guess it > is avoidable at the expense of added complexity, but for now it's just better > to make the code as clean as possible IMO). Patch [4/4] reworks the ACPI processor driver to use the common hotplug code. It basically splits the driver into two parts as described in the changelog, where the first part is essentially a scan handler and the second part is a driver, but it doesn't bind to struct acpi_device any more. Instead, it binds to processor devices under /sys/devices/system/cpu/ (the driver itself has a sysfs directory under /sys/bus/cpu/drivers/ which IMHO makes more sense than having it under /sys/bus/acpi/drivers/). The patch at https://patchwork.kernel.org/patch/2506371/ is a prerequisite for this series, but I'm going to push it for v3.10-rc2 if no one screams bloody murder. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html