Re: swap_cluster_info lockdep splat

Hugh Dickins <hughd@xxxxxxxxxx> · Thu, 16 Feb 2017 17:46:44 -0800 (PST)

On Thu, 16 Feb 2017, Tim Chen wrote:
> 
> > I do not understand your zest for putting wrappers around every little
> > thing, making it all harder to follow than it need be.  Here's the patch
> > I've been running with (but you have a leak somewhere, and I don't have
> > time to search out and fix it: please try sustained swapping and swapoff).
> > 
> 
> Hugh, trying to duplicate your test case.  So you were doing swapping,
> then swap off, swap on the swap device and restart swapping?

Repeated pair of make -j20 kernel builds in 700M RAM, 1.5G swap on SSD,
8 cpus; one of the builds in tmpfs, other in ext4 on loop on tmpfs file;
sizes tuned for plenty of swapping but no OOMing (it's an ancient 2.6.24
kernel I build, modern one needing a lot more space with a lot less in use).

How much of that is relevant I don't know: hopefully none of it, it's
hard to get the tunings right from scratch.  To answer your specific
question: yes, I'm not doing concurrent swapoffs in this test showing
the leak, just waiting for each of the pair of builds to complete,
then tearing down the trees, doing swapoff followed by swapon, and
starting a new pair of builds.

Sometimes it's the swapoff that fails with ENOMEM, more often it's a
fork during build that fails with ENOMEM: after 6 or 7 hours of load
(but timings show it getting slower leading up to that).  /proc/meminfo
did not give me an immediate clue, Slab didn't look surprising but
I may not have studied close enough.

I quilt-bisected it as far as the mm-swap series, good before, bad
after, but didn't manage to narrow it down further because of hitting
a presumably different issue inside the series, where swapoff ENOMEMed
much sooner (after 25 mins one time, during first iteration the next).

Hugh