On 4/18/23 07:36, James Bottomley wrote:
On Mon, 2023-04-17 at 16:06 -0700, Bart Van Assche wrote:
System shutdown happens as follows (see e.g. the systemd source file
src/shutdown/shutdown.c):
* sync() is called.
* reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
* If the reboot() system call returns, log an error message.
The reboot() system call causes the kernel to call kernel_restart(),
kernel_halt() or kernel_power_off(). Each of these functions calls
device_shutdown(). device_shutdown() calls sd_shutdown(). After
sd_shutdown() has been called the .shutdown() callback of the LLD
will be called. Hence, I/O submitted after sd_shutdown() will hang or
may even cause a kernel crash.
Let sd_shutdown() fail future I/O such that LLD .shutdown() callbacks
can be simplified.
What is the actual reason for this? What is it you think might be
submitting I/O after the system gets into this state? Current
sd_shutdown is constructed on the premise that it's the last thing that
ever happens to the device before reboot/power off which is why it
flushes the cache if necessary and stops the device if required, but
for most standard devices neither is required because we don't expect
Linux to go down with pending items in the block queue and for a write
through disk cache anything that's completed on the block queue is
safely durable on the device.
Hi James,
.shutdown() callbacks should quiesce I/O but the sd_shutdown() function
doesn't do this. I see this as a bug.
Regarding your question, I think that sd_check_events() can be called
while sd_shutdown() is in progress or after sd_shutdown() has finished.
sd_check_events() may submit a TEST UNIT READY command.
In pci_device_shutdown() one can see that the PCI core clears the bus
master bit for PCI devices during shutdown. In other words, it is not
safe to submit I/O or to process completions during invocation of
shutdown callbacks. I think that also shows that this patch fixes a bug.
Bart.