Re: [PATCH] NFS: state manager thread must stay running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 16 Sep 2014 21:43:07 -0400 Trond Myklebust
<trond.myklebust@xxxxxxxxxxxxxxx> wrote:

> On Wed, Aug 13, 2014 at 12:08 AM, NeilBrown <neilb@xxxxxxx> wrote:
> >
> >
> > If the server restarts at an awkward time it is possible for write
> > requests to block waiting for the state manager to run.
> > If the state manager isn't already running a new thread will
> > need to be started which requires a GFP_KERNEL allocation
> > (for do_fork).
> >
> > If memory is short, that GFP_KERNEL allocation could block on the
> > writes going out via NFS, resulting in a deadlock.
> >
> > The easiest solution is to keep the manager thread running
> > always.
> 
> I'm still trying to figure out what to do about this patch. There are
> 2 concerns:
> 
> 1) If we're so low on memory that we can't even start a state manager
> thread, then how do we guarantee that the recovery can be completed?
> We rely on that state manager thread being able to allocate memory to
> perform the lease, session, open and lock recoveries.

All the allocations performed by the state manager are (I assume) GFP_NOFS.
Creating a new thread requires GFP_KERNEL allocations, particularly in
dup_task_struct, which is called by kthreadd, which is well out of reach for
NFS to try to change the GFP flags.

Having said that, it occurs to me that my other dead-lock avoidance patch
might fix this problem as well.

The one case where I have seen a problem with starting the state manager, the
machine in question had several memory-shortage issues.  I think we finally
decided that a problem with too_many_isolated handling was the main cause.
So I cannot get very much reliable information from the stack trace there.  I
presume that in the current kernel, thread creation could deadlock against
nfs_release_page() (it was actually stuck in a congestion_wait()).

So I can't be certain, but I think this proactive thread creation won't be
needed once nfs_release_page() doesn't block indefinitely.

So you can drop this patch.

Thanks.

Though on the topic of patches that you don't know what to do with ....
Could you have a look at
   http://permalink.gmane.org/gmane.linux.nfs/56154
it appears that it slipped under your radar, and it fell of mine until just
recently.

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux