Re: [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 26 Sep 2023, Trond Myklebust wrote:
> On Tue, 2023-09-26 at 08:28 +1000, NeilBrown wrote:
> > On Sat, 23 Sep 2023, Trond Myklebust wrote:
> > > On Fri, 2023-09-22 at 13:22 -0400, Olga Kornievskaia wrote:
> > > > On Wed, Sep 20, 2023 at 8:27 PM Trond Myklebust
> > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > 
> > > > > On Wed, 2023-09-20 at 15:38 -0400, Anna Schumaker wrote:
> > > > > > Hi Trond,
> > > > > > 
> > > > > > On Sun, Sep 17, 2023 at 7:12 PM <trondmy@xxxxxxxxxx> wrote:
> > > > > > > 
> > > > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > > > > 
> > > > > > > Commit 4dc73c679114 reintroduces the deadlock that was
> > > > > > > fixed by
> > > > > > > commit
> > > > > > > aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> > > > > > > because
> > > > > > > it
> > > > > > > prevents the setup of new threads to handle reboot
> > > > > > > recovery,
> > > > > > > while
> > > > > > > the
> > > > > > > older recovery thread is stuck returning delegations.
> > > > > > 
> > > > > > I'm seeing a possible deadlock with xfstests generic/472 on
> > > > > > NFS
> > > > > > v4.x
> > > > > > after applying this patch. The test itself checks for various
> > > > > > swapfile
> > > > > > edge cases, so it seems likely something is going on there.
> > > > > > 
> > > > > > Let me know if you need more info
> > > > > > Anna
> > > > > > 
> > > > > 
> > > > > Did you turn off delegations on your server? If you don't, then
> > > > > swap
> > > > > will deadlock itself under various scenarios.
> > > > 
> > > > Is there documentation somewhere that says that delegations must
> > > > be
> > > > turned off on the server if NFS over swap is enabled?
> > > 
> > > I think the question is more generally "Is there documentation for
> > > NFS
> > > swap?"
> > 
> > The main difference between using NFS for swap and for regular file
> > IO
> > is in the handling of writes, and particularly in the style of memory
> > allocation that is safe while handling a write request (or anything
> > which might block some write request, etc).
> > 
> > For buffered IO, memory allocations must be GFP_NOIO or
> > PF_MEMALLOC_NOIO.
> > For swap-out, memory allocations must be GFP_MEMALLOC or PG_MEMALLOC.
> > 
> > That is the primary difference - all other differences are minor. 
> > This
> > difference might justify documentation suggesting that 
> > /proc/sys/vm/min_free_kbytes could usefully be increased, but I don't
> > see that more is needed.
> > 
> > The NOIO/MEMALLOC distinction is properly plumbed through nfs,
> > sunrpc,
> > and networking and all "just works".  The problem area is that
> > kthread_create() doesn't take a gfpflags_t argument, so it uses
> > GFP_KERNEL allocations to create the new thread.
> > 
> > This means that when a write cannot proceed without state management,
> > and state management requests that a threads be started, there is a
> > risk
> > of memory allocation deadlock.
> > I believe the risk is there even for buffered IO, but I'm not 100%
> > certain and in practice I don't think a deadlock has ever been
> > reported.
> > With swap-out it is fairly easy to trigger a deadlock if there is
> > heavy
> > swap-out traffic when state management is needed.
> > 
> > The common pattern in the kernel when a thread might be needed to
> > support writeout is to keep the thread running permanently (rather
> > than
> > to add a gfpflags_t to kthread_create), so that is what I added to
> > the
> > nfsv4 state manager.
> > 
> > However the state manager thread has a second use - returning
> > delegations.  This sometimes needs to run concurrently with state
> > management, so one thread is not enough.
> > 
> > What is that context for delegation return?  Does it ever block
> > writes? 
> > If it doesn't, would it make sense to use a work queue for returning
> > delegations - maybe system_wq?
> 
> These are potentially long-lived processes because there may be lock
> recovery involved, and because of the conflict with state recovery, so
> it does not make sense to put them on a workqueue.
> 

Makes sense - thanks.
Are writes blocked while the delegation returns proceeds?  If not, would
it be reasonable to start a separate kthread on-demand when a return is
requested?

Thanks,
NeilBrown



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux