On Mon, 2018-07-30 at 15:17 -0400, Theodore Y. Ts'o wrote: > I've been looking at what's the best way to make sure everything gets > cleanly flushed out to disk on a powerdown. Right now in > __orderly_poweroff(), we call emergency_sync() which kicks a > workqueue to flush all file systems and block devices --- and then we > immediately power down the system, before the scheduler even has a > chance to schedule the workqueue thread. Hopefully userspace has the > unmounted all file systems, which will has implicitly issued a cache > flush command, but if we have a userspace program writing to a block > device directly, currently there's nothing to make sure things will > get flushed out to the device. > > Beyond that, though, I'm interested in figuring out how to make sure > that all SCSI devices will receive (and acknowledge) SHUTDOWN command > so that the disks can be spun down and heads retracted to a safe > landing zone before we power down the system. The basic way to do this is to shut down the scsi bus, see below. > It appears the best way to do this is to call sd_shutdown(), since we > don't seem to have a high-level "shutdown" concept recognized in the > block layer (the way we currently, have, say support for "discard"). > > So the question is, what's the best way to architect something like > this. I could implement a hacky interator loop in the SCSI > subsystem, and call it directly from __orderly_poweroff in > kernel/reboot.c. But I'm pretty sure that would never get accepted > upstream, and so it would remain a Google data center hack. > > What do people think would be the best way of implementing something > that would be upstream acceptable? The sd_shutdown function is fully plumbed in to the current sysfs model with every scsi device being on a dummy scsi bus. So if you detach the device from the scsi bus, the remove function (which calls sd_shutdown) gets called as part of the detach. At the moment, the way that happens is either by specific detach of the device or via the module_exit function of SCSI, so if you can get that called before the system shuts down everything should just work. To be honest, I really thought this did actually happen anyway today. The separate device_shutdown() method in the kernel_shutdown_prepare() should call our sd_shutdown method (eventually), can you investigate why that isn't working for you ... is it being called too late? Alternatively, if you can find a way to get sysfs to trigger a shutdown on all its busses at some point then we'll get swept up in that. Finally, you could keep a list of busses needing to be shut down for storage safety and we could add scsi to that. James