On Sat, Feb 24, 2018 at 4:27 AM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote: > On Mon, Feb 05, 2018 at 09:28:37AM +0100, Rafael J. Wysocki wrote: >> On Sun, Feb 4, 2018 at 11:41 PM, Bart Van Assche <Bart.VanAssche@xxxxxxx> wrote: >> > On Wed, 2018-01-31 at 11:10 -0800, Darrick J. Wong wrote: >> >> For a brief moment I pondered whether it would make sense to make >> >> filesystems part of the device model so that the suspend code could work >> >> out fs <-> bdev dependencies and know in which order to freeze >> >> filesystems and quiesce devices, but every time I go digging into how >> >> all those macros work I get confused and my eyes glaze over, so I don't >> >> know if this is at all a good idea or just confused ramblings. >> > >> > If we have to go this way: shouldn't we introduce a new abstraction >> > ("storage stack element" or similar) rather than making filesystems part of >> > the device model? >> >> That would be my approach. >> >> Trying to "suspend" filesystems at the same time as I/O devices (and >> all of that asynchronously) may be problematic for ordering reasons >> and similar. > > Oh look, another ordering issue. And this is why I was not a fan of the > device link API even though that is what we got merged. Moving on... > >> Moreover, during hibernation devices are suspended for two times (and >> resumed in between, of course) whereas filesystems only need to be >> "suspended" once. > > From your point of view yes, but actually internally the VFS layer or > filesystems themselves may end up re-using this mechanism later for > other things like -- snapshotting. And if some folks have it the way > they want it, we may need a dependency map between filesystems anyway > for filesystem specific reasons. That's orthogonal to what I said. A dependency map between filesystems and other components of the block layer (like md, dm etc) will be necessary going forward (if all of the suspending and resuming of them is expected to be reliable anyway), but that doesn't change hibernation-related requirements one whit. Filesystems need to be suspended (or frozen or whatever terminology ends up being used for that) *before* creating a hibernation image and they *cannot* be resumed (unfrozen etc) after that until the system is off or the kernel decides that the hibernation has failed and rolls back. Whatever data/metadata are there in persistent storage before the image is created, changing them after that point is potentially critically harmful, so (in the hibernation case) all of the in-flight I/O that may end up being written to persistent storage needs to be flushed before creating the image. However, *devices* are resumed after creating the image so that the image itself can be written to persistent storage and are suspended after that again before putting the system to sleep (for wakeup to work, among other things). That's why suspend/resume of filesystems cannot be tied to suspend/resume of devices. Note that this isn't the case for system suspend/resume (suspend-to-RAM or suspend-to-idle). >> With that in mind, I would add a mechanism allowing filesystems (and >> possibly other components of the storage stack) to register a set of >> callbacks for suspend and resume and then invoking those callbacks in >> a specific order. > > That's what I had done in my series, the issue here is order. Order in my > series is simple but should work for starters, later however I suspect we'll > need something more robust to help. Quite likely. Thanks, Rafael