Re: [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 23 Sep 2023, Trond Myklebust wrote:
> On Fri, 2023-09-22 at 13:22 -0400, Olga Kornievskaia wrote:
> > On Wed, Sep 20, 2023 at 8:27 PM Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > 
> > > On Wed, 2023-09-20 at 15:38 -0400, Anna Schumaker wrote:
> > > > Hi Trond,
> > > > 
> > > > On Sun, Sep 17, 2023 at 7:12 PM <trondmy@xxxxxxxxxx> wrote:
> > > > > 
> > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > > 
> > > > > Commit 4dc73c679114 reintroduces the deadlock that was fixed by
> > > > > commit
> > > > > aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> > > > > because
> > > > > it
> > > > > prevents the setup of new threads to handle reboot recovery,
> > > > > while
> > > > > the
> > > > > older recovery thread is stuck returning delegations.
> > > > 
> > > > I'm seeing a possible deadlock with xfstests generic/472 on NFS
> > > > v4.x
> > > > after applying this patch. The test itself checks for various
> > > > swapfile
> > > > edge cases, so it seems likely something is going on there.
> > > > 
> > > > Let me know if you need more info
> > > > Anna
> > > > 
> > > 
> > > Did you turn off delegations on your server? If you don't, then
> > > swap
> > > will deadlock itself under various scenarios.
> > 
> > Is there documentation somewhere that says that delegations must be
> > turned off on the server if NFS over swap is enabled?
> 
> I think the question is more generally "Is there documentation for NFS
> swap?"

The main difference between using NFS for swap and for regular file IO
is in the handling of writes, and particularly in the style of memory
allocation that is safe while handling a write request (or anything
which might block some write request, etc).

For buffered IO, memory allocations must be GFP_NOIO or PF_MEMALLOC_NOIO.
For swap-out, memory allocations must be GFP_MEMALLOC or PG_MEMALLOC.

That is the primary difference - all other differences are minor.  This
difference might justify documentation suggesting that 
/proc/sys/vm/min_free_kbytes could usefully be increased, but I don't
see that more is needed.

The NOIO/MEMALLOC distinction is properly plumbed through nfs, sunrpc,
and networking and all "just works".  The problem area is that
kthread_create() doesn't take a gfpflags_t argument, so it uses
GFP_KERNEL allocations to create the new thread.

This means that when a write cannot proceed without state management,
and state management requests that a threads be started, there is a risk
of memory allocation deadlock.
I believe the risk is there even for buffered IO, but I'm not 100%
certain and in practice I don't think a deadlock has ever been reported.
With swap-out it is fairly easy to trigger a deadlock if there is heavy
swap-out traffic when state management is needed.

The common pattern in the kernel when a thread might be needed to
support writeout is to keep the thread running permanently (rather than
to add a gfpflags_t to kthread_create), so that is what I added to the
nfsv4 state manager.

However the state manager thread has a second use - returning
delegations.  This sometimes needs to run concurrently with state
management, so one thread is not enough.

What is that context for delegation return?  Does it ever block writes? 
If it doesn't, would it make sense to use a work queue for returning
delegations - maybe system_wq?

I think it might be best to have the nfsv4 state manager thread always
be running whether swap is enabled or not.  As I say I think there is at
least a theoretical risk of a deadlock even without swap, and having a
small test matrix is usually a good thing.

Thanks,
NeilBrown



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux