On Fri, Dec 01, 2017 at 02:05:44PM -0500, Jeff Layton wrote: > On Thu, 2017-11-30 at 17:41 +0100, Jiri Kosina wrote: > > On Fri, 1 Dec 2017, Yu Chen wrote: > > > > > BTW, is nfs able to be included in this set? I also encountered a > > > freeze() failure due to nfs access during that stage recently. > > > > The freezer usage in NFS is magnitudes more complicated, so it makes sense > > to first go after the lower hanging fruit to figure out the viability of > > the whole aproach in practice. > > > > Agreed that we should do this in stages. It doesn't help that freezer > handling in the client is a bit of a mess at this point... > > At a high level for NFS, I think we need to have freeze_fs make the RPC > engine "park" newly issued RPCs for that fs' client onto a > rpc_wait_queue. Any RPC that has already been sent however, we need to > wait for a reply. > > Once everything is quiesced we can return and call it frozen. > unfreeze_fs can then just have the engine stop parking RPCs and wake up > the waitq. That seems pretty reasonable. freezing is expected to take a bit of time to run - local filesystems can do a fair bit of IO draining queues, inflight operations and bringing the journal into a consistent state on disk before declaring the filesystem is frozen. > That should be enough to make suspend and resume work more reliably. If, > however, you're interested in making the cgroup freezer also work, then > we may need to do a bit more work to ensure that we don't end up with > frozen tasks squatting on VFS locks. None of the existing freezing code gives those guarantees. In fact, freezing a filesystem pretty much guarantees the opposite - that tasks *will freeze when holding VFS locks* - and so the cgroup freezer is broken by design if it requires tasks to be frozen without holding any VFS/filesystem lock context. So I wouldn't really worry about the cgroup freezer.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx