On Wed, 2023-04-19 at 15:36 +0200, Tomas Henzl wrote: > On 4/19/23 04:34, James Bottomley wrote: > > On Tue, 2023-04-18 at 11:37 -0700, Bart Van Assche wrote: > > > On 4/18/23 07:36, James Bottomley wrote: > > > > On Mon, 2023-04-17 at 16:06 -0700, Bart Van Assche wrote: > > > > > System shutdown happens as follows (see e.g. the systemd > > > > > source > > > > > file src/shutdown/shutdown.c): > > > > > * sync() is called. > > > > > * reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called. > > > > > * If the reboot() system call returns, log an error message. > > > > > > > > > > The reboot() system call causes the kernel to call > > > > > kernel_restart(), kernel_halt() or kernel_power_off(). Each > > > > > of these functions calls device_shutdown(). device_shutdown() > > > > > calls sd_shutdown(). After sd_shutdown() has been called the > > > > > .shutdown() callback of the LLD will be called. Hence, I/O > > > > > submitted after sd_shutdown() will hang or may even cause a > > > > > kernel crash. > > > > > > > > > > Let sd_shutdown() fail future I/O such that LLD .shutdown() > > > > > callbacks can be simplified. > > > > > > > > What is the actual reason for this? What is it you think might > > > > be submitting I/O after the system gets into this state? > > > > Current sd_shutdown is constructed on the premise that it's the > > > > last thing that ever happens to the device before reboot/power > > > > off which is why it flushes the cache if necessary and stops > > > > the device if required, but for most standard devices neither > > > > is required because we don't expect Linux to go down with > > > > pending items in the block queue and for a write through disk > > > > cache anything that's completed on the block queue is safely > > > > durable on the device. > > > > > > Hi James, > > > > > > .shutdown() callbacks should quiesce I/O but the sd_shutdown() > > > function doesn't do this. I see this as a bug. > > > > Why? They're only called on reboot or shutdown. In orderly cases, > > the queues should have been stopped long ago, so there should be no > > I/O to quiesce, and in the disorderly case, you obviously didn't > > care about the data anyway, so the job of the shutdown routine is > > to salvage as much as it can, which is why we flush the cache and > > stop the disk if necessary. > > A kernel panic has been reported caused by I/O arriving after > driver's shutdown (the driver is fixed now so just scsi exception > handling is logged). All drivers should have a protection against > late commands queued, with a block after sd.shutdown that could be > dropped and so well written drivers could enjoy a bit easier/faster > code and and those not well written wouldn't be waiting for issues > yet to come. > > What actually is the downside of blocking further I/O in sd shutdown? Well, it's a layering problem: if all queues should be stopped on shutdown, then this should be done from block (for all queues including those outside SCSI). device_shutdown() goes in reverse devices_kset->list order, so it looks like it would do the PCI device then the SCSI device then the ULD then block, so we can use the queues in SCSI for emergency actions (like flush or stop) before block goes down. Although this isn't guaranteed; there are things, like device_link_add, which reorder this kset, so we'd need to make sure the above assumption is correct. James