Re: sporadic hangs on generic/186

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 07 Apr 2022, Dave Chinner wrote:
> On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote:
> > In the last couple days I've started getting hangs on xfstests
> > generic/186 on upstream.  I also notice the test completes after 10+
> > hours (usually it takes about 5 minutes).  Sometimes this is accompanied
> > by "nfs: RPC call returned error 12" on the client.
> 
> #define ENOMEM          12      /* Out of memory */
> 
> So either the client or the server is running out of memory
> somewhere?

Probably the client.  There are a bunch of changes recently which add
__GFP_NORETRY to memory allocations from PF_WQ_WORKERs because that can
result in deadlocks when swapping over NFS.
This means that kmalloc request that previously never failed (because
GFP_KERNEL never fails for kernel threads I think) can now fail.  This
has tickled one bug that I know of.  There are likely to be more.

The RPC code should simply retry these allocations after a short delay. 
HZ/4 is the number that is used in a couple of places.  Possibly there
are more places that need to handle -ENOMEM with rpc_delay().

NeilBrown



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux