Re: [PATCH v2 0/9] support large folio swap-out and swap-in for shmem

Hugh Dickins <hughd@xxxxxxxxxx> · Wed, 19 Jun 2024 01:16:42 -0700 (PDT)

On Wed, 19 Jun 2024, Baolin Wang wrote:
> On 2024/6/19 04:05, Andrew Morton wrote:
> > On Tue, 18 Jun 2024 14:54:12 +0800 Baolin Wang
> > <baolin.wang@xxxxxxxxxxxxxxxxx> wrote:
> > 
> >> Shmem will support large folio allocation [1] [2] to get a better
> >> performance,
> >> however, the memory reclaim still splits the precious large folios when
> >> trying
> >> to swap-out shmem, which may lead to the memory fragmentation issue and can
> >> not
> >> take advantage of the large folio for shmeme.
> >>
> >> Moreover, the swap code already supports for swapping out large folio
> >> without
> >> split, and large folio swap-in[3] series is queued into mm-unstable branch.
> >> Hence this patch set also supports the large folio swap-out and swap-in for
> >> shmem.
> > 
> > I'll add this to mm-unstable for some exposure, but I wonder how much
> > testing it will have recieved by the time the next merge window opens?
> 
> Thanks Andrew. I am fine with this series going to 6.12 if you are concerned
> about insufficient testing (and let's also wait for Hugh's comments). Since we
> (Daniel and I) have some follow-up patches that will rely on this swap series,
> hope this series can be tested as extensively as possible to ensure its
> stability in the mm branch.

Thanks for giving it the exposure, Andrew, but please drop it from
mm-unstable until the next cycle. I'd been about to write to say I
wouldn't be trying it until next cycle, when your mm-commits came in:
so I thought I ought at least to give mm-everything-2024-06-18 a try.

Baolin may have fixed stuff, but he (or the interaction with other mm
work) has broken stuff too: I couldn't get as far with it as with the
previous version. Just "cp -a" of a kernel source tree into a tmpfs
huge=always size=<bigenough> failed with lots of ENOSPCs, and when
"rm -rf"ed lots of WARN_ON(inode->i_blocks) from shmem_evict_inode();
and on second attempt, then a VM_BUG_ON_FOLIO(!folio_contains) from
find_lock_entries().

Or maybe that VM_BUG_ON_FOLIO() was unrelated, but a symptom of the bug
I'm trying to chase even when this series is reverted: some kind of page
double usage, manifesting as miscellaneous "Bad page"s and VM_BUG_ONs,
mostly from page reclaim or from exit_mmap(). I'm still getting a feel
for it, maybe it occurs soon enough for a reliable bisection, maybe not.

(While writing, a run with mm-unstable cut off at 2a9964cc5d27,
drop KSM_KMEM_CACHE(), instead of reverting just Baolin's latest,
has not yet hit any problem: too early to tell but promising.)

And before 2024-06-18, I was working on mm-everything-2024-06-15 minus
Chris Li's mTHP swap series: which worked fairly well, until it locked
up with __try_to_reclaim_swap()'s filemap_get_folio() spinning around
on a page with 0 refcount, while a page table lock is held which one
by one the other CPUs come to want for reclaim. On two machines.

None of these problems seen on Stephen's last next-2024-06-13.
I had wanted to see if mm-everything-2024-06-18 fixed that lockup,
but with the new problems I cannot tell (or it could all be the same
problem: but if so, odd that it manifests consistently differently).

There are way too many mTHP shmem and swap patch series floating
around at the moment, in mm and in fs, for me to cope with:
everyone, please slow down and test more.

Hugh

p.s. I think Andrew Bresticker's do_set_pmd() fix has soaked
long enough, and deserves promotion to hotfix and Linus soon.