Re: xfs_inode not reclaimed/memory leak on 5.2.16

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 30 Sep 2019 18:54:06 +1000

On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
> Simply running “du -hc” on a large directory tree causes du to be
> killed because of kernel paging request failure in the XFS code.

dmesg output? if the system was still running, then you might be
able to pull the trace from syslog. But we can't do much without
knowing what the actual failure was....

FWIW, one of my regular test workloads is iterating a directory tree
with 50 million inodes in several different ways to stress reclaim
algorithms in ways that users do. I haven't seen issues with that
test for a while, so it's not an obvious problem whatever you came
across.

> I ran slabtop, and it showed tons of xfs_inode objects.

Sure, because your workload is iterating inodes.

> The system was rather unhappy after that, so I wasn't able to capture
> much more information.
> 
> Is this a known issue on Linux 5.2?

Not that I know of.

> I don't see it with kernel
> 5.0.20.  Those are plain upstream kernels built for x86-64, with no
> unusual config options (that I know of).

We've had quite a few memory reclaim regressions in recent times
that have displayed similar symptoms - XFS is often just the
messenger because the inode cache is generating the memory pressure.
e.g. the shrinker infrastructure was broken in 4.16 and then broken
differently in 4.17 to try to fix it, and we didn't hear about them
until about 4.18/4.19 when users started to trip over them. I fixed
those problems in 5.0, but there's every chance that there have been
new regressions since then.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx