Re: Excessive xfs_inode allocations trigger OOM killer

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 21 Sep 2016 06:30:39 +1000

On Tue, Sep 20, 2016 at 09:48:42PM +0200, Florian Weimer wrote:
> I have an amd64 4.7.1 system (upstream kernel, pretty regular config)
> with a small file system:
> 
> Filesystem     1K-blocks      Used Available Use% Mounted on
> /dev/sda1      922595184 800396464 122198720  87% /
> Filesystem        Inodes    IUsed     IFree IUse% Mounted on
> /dev/sda1      504110496 15305094 488805402    4% /

So 15 million inodes. A lot for a small filesystem.

> The odd thing is that after a while, XFS consumes a lot of memory
> according to slabtop:
> 
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
> 4121208 4121177  99%    0.88K 1030302        4   4121208K xfs_inode
> 986286 985229  99%    0.19K  46966       21    187864K dentry
> 723255 723076  99%    0.10K  18545       39     74180K buffer_head
> 270263 269251  99%    0.56K  38609        7    154436K radix_tree_node
> 140310  67409  48%    0.38K  14031       10     56124K mnt_cache

That's not odd at all. It means your workload is visiting millions
on inodes in your filesystem between serious memory pressure events.

> (I have attached the /proc/meminfo contents in case it offers further
> clues.)
> 
> Confronted with large memory allocations (from “make -j12” and
> compiling GCC, so perhaps ~8 GiB of memory), the OOM killer kicks in
> and kills some random process.  I would have expected that some
> xfs_inodes are freed instead.

The oom killer is unreliable and often behaves very badly, and
that's typicaly not an XFS problem.

What is the full output off the oom killer invocations from dmesg?

> I don't think this is an ordinary memory leak.

It's not a memory leak at all.

> The last time I saw
> something like the slabtop output above, I could do “sysctl
> vm.drop_caches = 3”, and the amount of memory allocated reported by
> slabtop was reduced considerably.  (I have not checked if the memory
> was actually returned to the system.)  I have not done this now so
> that I can gather further data for debugging.i

How long did the sysctl take to run to free those inodes? A few
seconds, or minutes?

> I am not sure what
> triggers this huge allocation.  It could be related to my Gnus mail
> spool (which contains lots and lots of small files).

OK - does that regularly dirty lots of those small files? What sort
of storage are you using, and what fs config?

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> Dirty:                28 kB
> Writeback:             0 kB

There's no dirty data, and dropping caches makes progress, so this
doesn't /sound/ like reclaim is getting stalled by dirty object
writeback. More info needed.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs