Re: What's the best way to call sd_shutdown() on all SCSI disks on shutdown?

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Mon, 30 Jul 2018 13:35:36 -0700

On Mon, 2018-07-30 at 15:17 -0400, Theodore Y. Ts'o wrote:
> I've been looking at what's the best way to make sure everything gets
> cleanly flushed out to disk on a powerdown.  Right now in
> __orderly_poweroff(), we call emergency_sync() which kicks a
> workqueue to flush all file systems and block devices --- and then we
> immediately power down the system, before the scheduler even has a
> chance to schedule the workqueue thread.  Hopefully userspace has the
> unmounted all file systems, which will has implicitly issued a cache
> flush command, but if we have a userspace program writing to a block
> device directly, currently there's nothing to make sure things will
> get flushed out to the device.
> 
> Beyond that, though, I'm interested in figuring out how to make sure
> that all SCSI devices will receive (and acknowledge) SHUTDOWN command
> so that the disks can be spun down and heads retracted to a safe
> landing zone before we power down the system.

The basic way to do this is to shut down the scsi bus, see below.

> It appears the best way to do this is to call sd_shutdown(), since we
> don't seem to have a high-level "shutdown" concept recognized in the
> block layer (the way we currently, have, say support for "discard").
> 
> So the question is, what's the best way to architect something like
> this.  I could implement a hacky interator loop in the SCSI
> subsystem, and call it directly from __orderly_poweroff in
> kernel/reboot.c.  But I'm pretty sure that would never get accepted
> upstream, and so it would remain a Google data center hack.
> 
> What do people think would be the best way of implementing something
> that would be upstream acceptable?

The sd_shutdown function is fully plumbed in to the current sysfs model
with every scsi device being on a dummy scsi bus. So if you detach the
device from the scsi bus, the remove function (which calls sd_shutdown)
gets called as part of the detach.  At the moment, the way that happens
is either by specific detach of the device or via the module_exit
function of SCSI, so if you can get that called before the system shuts
down everything should just work.  To be honest, I really thought this
did actually happen anyway today.  The separate device_shutdown()
method in the kernel_shutdown_prepare() should call our sd_shutdown
method (eventually), can you investigate why that isn't working for you
... is it being called too late?

Alternatively, if you can find a way to get sysfs to trigger a shutdown
on all its busses at some point then we'll get swept up in that. 
Finally, you could keep a list of busses needing to be shut down for
storage safety and we could add scsi to that.

James