On 3/28/2022 6:19 PM, Oliver O'Halloran wrote:
On Tue, Mar 29, 2022 at 10:35 AM Tanjore Suresh <tansuresh@xxxxxxxxxx> wrote:
This changes the bus driver interface with additional entry points
to enable devices to implement asynchronous shutdown. The existing
synchronous interface to shutdown is unmodified and retained for
backward compatibility.
This changes the common device shutdown code to enable devices to
participate in asynchronous shutdown implementation.
nice to see someone looking at improving the shutdown path
Signed-off-by: Tanjore Suresh <tansuresh@xxxxxxxxxx>
---
drivers/base/core.c | 39 +++++++++++++++++++++++++++++++++++++-
include/linux/device/bus.h | 10 ++++++++++
2 files changed, 48 insertions(+), 1 deletion(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 3d6430eb0c6a..359e7067e8b8 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
*snip*
This all seems a bit dangerous and I'm wondering what systems you've
tested these changes with. I had a look at implementing something
similar a few years ago and one case that always concerned me was
embedded systems where the PCIe root complex also has a driver bound.
Say you've got the following PCIe topology:
00:00.0 - root port
01:00.0 - nvme drive
With the current implementation of device_shutdown() we can guarantee
that the child device (the nvme) is shut down before we start trying
to shut down the parent device (the root complex) so there's no
possibility of deadlocks and other dependency headaches. With this
implementation of async shutdown we lose that guarantee and I'm not
sure what the consequences are. Personally I was never able to
convince myself it was safe, but maybe you're braver than I am :)
That all said, there's probably only a few kinds of device that will
really want to implement async shutdown support so maybe you can
restrict it to leaf devices and flip the ordering around to something
like:
It seems like it might be helpful to split the async shutdowns into
refcounted hierarchies and proceed with the next level up when all the
refs are in.
Ex:
00:00.0 - RP
01:00.0 - NVMe A
02:00.0 - Bridge USP
03:00.0 - Bridge DSP
04:00.0 - NVMe B
03:00.1 - Bridge DSP
05:00.0 - NVMe C
NVMe A could start shutting down at the beginning of the hierarchy
traversal. Then async shutdown of bus 3 wouldn't start until all
children of bus 3 are shutdown.
You could probably do this by having the async_shutdown_list in the pci_bus.
for_each_device(dev) {
if (can_async(dev) && has_no_children(dev))
start_async_shutdown(dev)
}
wait_for_all_async_shutdowns_to_finish()
// tear down the remaining system devices synchronously
for_each_device(dev)
do_sync_shutdown(dev)
/*
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index a039ab809753..e261819601e9 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -93,6 +101,8 @@ struct bus_type {
void (*sync_state)(struct device *dev);
void (*remove)(struct device *dev);
void (*shutdown)(struct device *dev);
+ void (*shutdown_pre)(struct device *dev);
+ void (*shutdown_post)(struct device *dev);
Call them shutdown_async_start() / shutdown_async_end() or something
IMO. These names are not at all helpful and they're easy to mix up
their role with the class based shutdown_pre / _post