Re: [PATCH v5 3/4] mm: support large folios swapin as a whole for zRAM-like swapfile

Chuanhua Han <chuanhuahan@xxxxxxxxx> · Mon, 29 Jul 2024 21:32:20 +0800



Matthew Wilcox <willy@xxxxxxxxxxxxx> 于2024年7月29日周一 20:55写道：
>
> On Mon, Jul 29, 2024 at 02:36:38PM +0800, Chuanhua Han wrote:
> > Matthew Wilcox <willy@xxxxxxxxxxxxx> 于2024年7月29日周一 11:51写道：
> > >
> > > On Fri, Jul 26, 2024 at 09:46:17PM +1200, Barry Song wrote:
> > > > -                     folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
> > > > -                                             vma, vmf->address, false);
> > > > +                     folio = alloc_swap_folio(vmf);
> > > >                       page = &folio->page;
> > >
> > > This is no longer correct.  You need to set 'page' to the precise page
> > > that is being faulted rather than the first page of the folio.  It was
> > > fine before because it always allocated a single-page folio, but now it
> > > must use folio_page() or folio_file_page() (whichever has the correct
> > > semantics for you).
> > >
> > > Also you need to fix your test suite to notice this bug.  I suggest
> > > doing that first so that you know whether you've got the calculation
> > > correct.
> >
> > >
> > >
> > This is no problem now, we support large folios swapin as a whole, so
> > the head page is used here instead of the page that is being faulted.
> > You can also refer to the current code context, now support large
> > folios swapin as a whole, and previously only support small page
> > swapin is not the same.
>
> You have completely failed to understand the problem.  Let's try it this
> way:
>
> We take a page fault at address 0x123456789000.
> If part of a 16KiB folio, that's page 1 of the folio at 0x123456788000.
> If you now map page 0 of the folio at 0x123456789000, you've
> given the user the wrong page!  That looks like data corruption.
The user does not get the wrong data because we are mapping the whole,
and for 16KiB folio, we map 16KiB through the page table.
>
> The code in
>         if (folio_test_large(folio) && folio_test_swapcache(folio)) {
> as Barry pointed out will save you -- but what if those conditions fail?
> What if the mmap has been mremap()ed and the folio now crosses a PMD
> boundary?  mk_pte() will now be called on the wrong page.
These special cases have been dealt with in our patch. For mthp's
large folio, mk_pte uses head page to construct pte.


-- 
Thanks,
Chuanhua