On Mon, 4 Mar 2013 22:08:34 +0000 "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> wrote: > On Mon, 2013-03-04 at 21:53 +0100, Oleg Nesterov wrote: > > On 03/04, Mandeep Singh Baines wrote: > > > > > > The problem is that freezer_count() calls try_to_freeze(). In this > > > case, try_to_freeze() is not really adding any value. > > > > Well, I tend to agree. > > > > If a task calls __refrigerator() holding a lock which another freezable > > task can wait for, this is not freezer-friendly. > > > > freezable_schedule/freezer_do_not_count/etc not only means "I won't be > > active if freezing()", it should also mean "I won't block suspend/etc". > > If suspend for some reason requires a re-entrant mount, then yes, I can > see how it might be a problem that we're holding a mount-related lock. > The question is why is that necessary? > > > OTOH, I understand that probably it is not trivial to change this code > > to make it freezer-friendly. But at least I disagree with "push your > > problems onto others". > > That code can't be made freezer-friendly if it isn't allowed to hold > basic filesystem-related locks across RPC calls. A number of those RPC > calls do things that need to be protected by VFS or MM-level locks. > i.e.: lookups, file creation/deletion, page fault in/out, ... > > IOW: the problem would need to be solved differently, possibly by adding > a new FIFREEZE-like call to allow the filesystem to quiesce itself > _before_ NetworkManager pulls the rug out from underneath it. There > would still be plenty of corner cases to keep people entertained (e.g. > the server goes down before the quiesce call is invoked) but at least > the top 90% of cases would be solved. > Ok, I think I'm starting to get it. It doesn't necessarily need a reentrant mount or anything like that. Consider this case (which is not even that unlikely): Suppose there are two tasks calling unlink() on files in the same NFS directory. First task takes the i_mutex on the parent directory and goes to ask the server to remove the file. Second task calls unlink just afterward and blocks on the parent's i_mutex. Now, a suspend event comes in and freezes the first task while it's waiting on the response. It still holds the parent's i_mutex. Freezer now gets to the second task and can't freeze it because the sleep on that mutex isn't freezable. So, not a deadlock per-se in this case but it does prevent the freezer from running to completion. I don't see any way to solve it though w/o making all mutexes freezable. Note that I don't think this is really limited to NFS either -- a lot of other filesystems will have similar problems: CIFS, some FUSE variants, etc... -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html