On Tue, 16 Sep 2014 21:43:07 -0400 Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote: > On Wed, Aug 13, 2014 at 12:08 AM, NeilBrown <neilb@xxxxxxx> wrote: > > > > > > If the server restarts at an awkward time it is possible for write > > requests to block waiting for the state manager to run. > > If the state manager isn't already running a new thread will > > need to be started which requires a GFP_KERNEL allocation > > (for do_fork). > > > > If memory is short, that GFP_KERNEL allocation could block on the > > writes going out via NFS, resulting in a deadlock. > > > > The easiest solution is to keep the manager thread running > > always. > > I'm still trying to figure out what to do about this patch. There are > 2 concerns: > > 1) If we're so low on memory that we can't even start a state manager > thread, then how do we guarantee that the recovery can be completed? > We rely on that state manager thread being able to allocate memory to > perform the lease, session, open and lock recoveries. All the allocations performed by the state manager are (I assume) GFP_NOFS. Creating a new thread requires GFP_KERNEL allocations, particularly in dup_task_struct, which is called by kthreadd, which is well out of reach for NFS to try to change the GFP flags. Having said that, it occurs to me that my other dead-lock avoidance patch might fix this problem as well. The one case where I have seen a problem with starting the state manager, the machine in question had several memory-shortage issues. I think we finally decided that a problem with too_many_isolated handling was the main cause. So I cannot get very much reliable information from the stack trace there. I presume that in the current kernel, thread creation could deadlock against nfs_release_page() (it was actually stuck in a congestion_wait()). So I can't be certain, but I think this proactive thread creation won't be needed once nfs_release_page() doesn't block indefinitely. So you can drop this patch. Thanks. Though on the topic of patches that you don't know what to do with .... Could you have a look at http://permalink.gmane.org/gmane.linux.nfs/56154 it appears that it slipped under your radar, and it fell of mine until just recently. NeilBrown
Attachment:
signature.asc
Description: PGP signature