On Tue, 26 Sep 2023, Trond Myklebust wrote: > On Tue, 2023-09-26 at 08:28 +1000, NeilBrown wrote: > > On Sat, 23 Sep 2023, Trond Myklebust wrote: > > > On Fri, 2023-09-22 at 13:22 -0400, Olga Kornievskaia wrote: > > > > On Wed, Sep 20, 2023 at 8:27 PM Trond Myklebust > > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > On Wed, 2023-09-20 at 15:38 -0400, Anna Schumaker wrote: > > > > > > Hi Trond, > > > > > > > > > > > > On Sun, Sep 17, 2023 at 7:12 PM <trondmy@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > > > > > > > > > Commit 4dc73c679114 reintroduces the deadlock that was > > > > > > > fixed by > > > > > > > commit > > > > > > > aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock") > > > > > > > because > > > > > > > it > > > > > > > prevents the setup of new threads to handle reboot > > > > > > > recovery, > > > > > > > while > > > > > > > the > > > > > > > older recovery thread is stuck returning delegations. > > > > > > > > > > > > I'm seeing a possible deadlock with xfstests generic/472 on > > > > > > NFS > > > > > > v4.x > > > > > > after applying this patch. The test itself checks for various > > > > > > swapfile > > > > > > edge cases, so it seems likely something is going on there. > > > > > > > > > > > > Let me know if you need more info > > > > > > Anna > > > > > > > > > > > > > > > > Did you turn off delegations on your server? If you don't, then > > > > > swap > > > > > will deadlock itself under various scenarios. > > > > > > > > Is there documentation somewhere that says that delegations must > > > > be > > > > turned off on the server if NFS over swap is enabled? > > > > > > I think the question is more generally "Is there documentation for > > > NFS > > > swap?" > > > > The main difference between using NFS for swap and for regular file > > IO > > is in the handling of writes, and particularly in the style of memory > > allocation that is safe while handling a write request (or anything > > which might block some write request, etc). > > > > For buffered IO, memory allocations must be GFP_NOIO or > > PF_MEMALLOC_NOIO. > > For swap-out, memory allocations must be GFP_MEMALLOC or PG_MEMALLOC. > > > > That is the primary difference - all other differences are minor. > > This > > difference might justify documentation suggesting that > > /proc/sys/vm/min_free_kbytes could usefully be increased, but I don't > > see that more is needed. > > > > The NOIO/MEMALLOC distinction is properly plumbed through nfs, > > sunrpc, > > and networking and all "just works". The problem area is that > > kthread_create() doesn't take a gfpflags_t argument, so it uses > > GFP_KERNEL allocations to create the new thread. > > > > This means that when a write cannot proceed without state management, > > and state management requests that a threads be started, there is a > > risk > > of memory allocation deadlock. > > I believe the risk is there even for buffered IO, but I'm not 100% > > certain and in practice I don't think a deadlock has ever been > > reported. > > With swap-out it is fairly easy to trigger a deadlock if there is > > heavy > > swap-out traffic when state management is needed. > > > > The common pattern in the kernel when a thread might be needed to > > support writeout is to keep the thread running permanently (rather > > than > > to add a gfpflags_t to kthread_create), so that is what I added to > > the > > nfsv4 state manager. > > > > However the state manager thread has a second use - returning > > delegations. This sometimes needs to run concurrently with state > > management, so one thread is not enough. > > > > What is that context for delegation return? Does it ever block > > writes? > > If it doesn't, would it make sense to use a work queue for returning > > delegations - maybe system_wq? > > These are potentially long-lived processes because there may be lock > recovery involved, and because of the conflict with state recovery, so > it does not make sense to put them on a workqueue. > Makes sense - thanks. Are writes blocked while the delegation returns proceeds? If not, would it be reasonable to start a separate kthread on-demand when a return is requested? Thanks, NeilBrown