Re: [PATCH v2 0/9] support large folio swap-out and swap-in for shmem

Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> · Wed, 19 Jun 2024 16:58:55 +0800

On 2024/6/19 16:16, Hugh Dickins wrote:
On Wed, 19 Jun 2024, Baolin Wang wrote:
On 2024/6/19 04:05, Andrew Morton wrote:
On Tue, 18 Jun 2024 14:54:12 +0800 Baolin Wang
<baolin.wang@xxxxxxxxxxxxxxxxx> wrote:

Shmem will support large folio allocation [1] [2] to get a better
performance,
however, the memory reclaim still splits the precious large folios when
trying
to swap-out shmem, which may lead to the memory fragmentation issue and can
not
take advantage of the large folio for shmeme.

Moreover, the swap code already supports for swapping out large folio
without
split, and large folio swap-in[3] series is queued into mm-unstable branch.
Hence this patch set also supports the large folio swap-out and swap-in for
shmem.

I'll add this to mm-unstable for some exposure, but I wonder how much
testing it will have recieved by the time the next merge window opens?

Thanks Andrew. I am fine with this series going to 6.12 if you are concerned
about insufficient testing (and let's also wait for Hugh's comments). Since we
(Daniel and I) have some follow-up patches that will rely on this swap series,
hope this series can be tested as extensively as possible to ensure its
stability in the mm branch.

Thanks for giving it the exposure, Andrew, but please drop it from
mm-unstable until the next cycle. I'd been about to write to say I
wouldn't be trying it until next cycle, when your mm-commits came in:
so I thought I ought at least to give mm-everything-2024-06-18 a try.

Baolin may have fixed stuff, but he (or the interaction with other mm
work) has broken stuff too: I couldn't get as far with it as with the
previous version. Just "cp -a" of a kernel source tree into a tmpfs
huge=always size=<bigenough> failed with lots of ENOSPCs, and when
"rm -rf"ed lots of WARN_ON(inode->i_blocks) from shmem_evict_inode();
and on second attempt, then a VM_BUG_ON_FOLIO(!folio_contains) from
find_lock_entries().

Thanks Hugh for giving it a try. I also can reproduce the 
WARN_ON(inode->i_blocks) issue with today's mm-unstable branch (sadly, 
this issue didn't occur in the older mm-unstable branch), and now I have 
a fix in my local tree. But I will not send out a V3 so quickly, and I 
agree we can drop this series from mm-unstable branch until next cycle.

Or maybe that VM_BUG_ON_FOLIO() was unrelated, but a symptom of the bug

I did not encounter the VM_BUG_ON_FOLIO() issue, but let me try your 
testing case...

I'm trying to chase even when this series is reverted: some kind of page
double usage, manifesting as miscellaneous "Bad page"s and VM_BUG_ONs,
mostly from page reclaim or from exit_mmap(). I'm still getting a feel
for it, maybe it occurs soon enough for a reliable bisection, maybe not.

(While writing, a run with mm-unstable cut off at 2a9964cc5d27,
drop KSM_KMEM_CACHE(), instead of reverting just Baolin's latest,
has not yet hit any problem: too early to tell but promising.)

And before 2024-06-18, I was working on mm-everything-2024-06-15 minus
Chris Li's mTHP swap series: which worked fairly well, until it locked
up with __try_to_reclaim_swap()'s filemap_get_folio() spinning around
on a page with 0 refcount, while a page table lock is held which one
by one the other CPUs come to want for reclaim. On two machines.

None of these problems seen on Stephen's last next-2024-06-13.
I had wanted to see if mm-everything-2024-06-18 fixed that lockup,
but with the new problems I cannot tell (or it could all be the same
problem: but if so, odd that it manifests consistently differently).

There are way too many mTHP shmem and swap patch series floating
around at the moment, in mm and in fs, for me to cope with:
everyone, please slow down and test more.

Sure. I will continue to do more testing. Thanks.