Re: sporadic hangs on generic/186

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 07 Apr 2022, Dave Chinner wrote:
> On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote:
> > In the last couple days I've started getting hangs on xfstests
> > generic/186 on upstream.  I also notice the test completes after 10+
> > hours (usually it takes about 5 minutes).  Sometimes this is accompanied
> > by "nfs: RPC call returned error 12" on the client.
> 
> #define ENOMEM          12      /* Out of memory */
> 
> So either the client or the server is running out of memory
> somewhere?

Probably the client.  There are a bunch of changes recently which add
__GFP_NORETRY to memory allocations from PF_WQ_WORKERs because that can
result in deadlocks when swapping over NFS.
This means that kmalloc request that previously never failed (because
GFP_KERNEL never fails for kernel threads I think) can now fail.  This
has tickled one bug that I know of.  There are likely to be more.

The RPC code should simply retry these allocations after a short delay. 
HZ/4 is the number that is used in a couple of places.  Possibly there
are more places that need to handle -ENOMEM with rpc_delay().

NeilBrown



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux