On 18/01/2024 23:54, Barry Song wrote: > On Thu, Jan 18, 2024 at 11:25 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >> >> On 18/01/2024 11:10, Barry Song wrote: >>> On an embedded system like Android, more than half of anon memory is actually >>> in swap devices such as zRAM. For example, while an app is switched to back- >>> ground, its most memory might be swapped-out. >>> >>> Now we have mTHP features, unfortunately, if we don't support large folios >>> swap-in, once those large folios are swapped-out, we immediately lose the >>> performance gain we can get through large folios and hardware optimization >>> such as CONT-PTE. >>> >>> In theory, we don't need to rely on Ryan's swap out patchset[1]. That is to say, >>> before swap-out, if some memory were normal pages, but when swapping in, we >>> can also swap-in them as large folios. >> >> I think this could also violate MADV_NOHUGEPAGE; if the application has >> requested that we do not create a THP, then we had better not; it could cause a >> correctness issue in some circumstances. You would need to pay attention to this >> vma flag if taking this approach. >> >>> But this might require I/O happen at >>> some random places in swap devices. So we limit the large folios swap-in to >>> those areas which were large folios before swapping-out, aka, swaps are also >>> contiguous in hardware. >> >> In fact, even this may not be sufficient; it's possible that a contiguous set of >> base pages (small folios) were allocated to a virtual mapping and all swapped >> out together - they would likely end up contiguous in the swap file, but should >> not be swapped back in as a single folio because of this (same reasoning applies >> to cluster of smaller THPs that you mistake for a larger THP, etc). >> >> So you will need to check what THP sizes are enabled and check the VMA >> suitability regardless; Perhaps you are already doing this - I haven't looked at >> the code yet. > > we are actually re-using your alloc_anon_folio() by adding a parameter > to make it > support both do_anon_page and do_swap_page, > > -static struct folio *alloc_anon_folio(struct vm_fault *vmf) > +static struct folio *alloc_anon_folio(struct vm_fault *vmf, > + bool (*pte_range_check)(pte_t *, int)) > { > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > struct vm_area_struct *vma = vmf->vma; > @@ -4190,7 +4270,7 @@ static struct folio *alloc_anon_folio(struct > vm_fault *vmf) > order = highest_order(orders); > while (orders) { > addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); > - if (pte_range_none(pte + pte_index(addr), 1 << order)) > + if (pte_range_check(pte + pte_index(addr), 1 << order)) > break; > order = next_order(&orders, order); > } > @@ -4269,7 +4349,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) > if (unlikely(anon_vma_prepare(vma))) > goto oom; > /* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */ > - folio = alloc_anon_folio(vmf); > + folio = alloc_anon_folio(vmf, pte_range_none); > if (IS_ERR(folio)) > return 0; > if (!folio) > -- > > I assume this has checked everything? Ahh yes, very good. In that case you can disregard what I said; its already covered. I notice that this series appears as a reply to my series. I'm not sure what the normal convention is, but I expect more people would see it if you posted it as its own thread? > >> >> I'll aim to review the code in the next couple of weeks. > > nice, thanks! > >> >> Thanks, >> Ryan >> >>> On the other hand, in OPPO's product, we've deployed >>> anon large folios on millions of phones[2]. we enhanced zsmalloc and zRAM to >>> compress and decompress large folios as a whole, which help improve compression >>> ratio and decrease CPU consumption significantly. In zsmalloc and zRAM we can >>> save large objects whose original size are 64KiB for example. So it is also a >>> better choice for us to only swap-in large folios for those compressed large >>> objects as a large folio can be decompressed all together. >>> >>> Note I am moving my previous "arm64: mm: swap: support THP_SWAP on hardware >>> with MTE" to this series as it might help review. >>> >>> [1] [PATCH v3 0/4] Swap-out small-sized THP without splitting >>> https://lore.kernel.org/linux-mm/20231025144546.577640-1-ryan.roberts@xxxxxxx/ >>> [2] OnePlusOSS / android_kernel_oneplus_sm8550 >>> https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplus/sm8550_u_14.0.0_oneplus11 >>> >>> Barry Song (2): >>> arm64: mm: swap: support THP_SWAP on hardware with MTE >>> mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap() >>> >>> Chuanhua Han (4): >>> mm: swap: introduce swap_nr_free() for batched swap_free() >>> mm: swap: make should_try_to_free_swap() support large-folio >>> mm: support large folios swapin as a whole >>> mm: madvise: don't split mTHP for MADV_PAGEOUT >>> >>> arch/arm64/include/asm/pgtable.h | 21 ++---- >>> arch/arm64/mm/mteswap.c | 42 ++++++++++++ >>> include/asm-generic/tlb.h | 10 +++ >>> include/linux/huge_mm.h | 12 ---- >>> include/linux/pgtable.h | 62 ++++++++++++++++- >>> include/linux/swap.h | 6 ++ >>> mm/madvise.c | 48 ++++++++++++++ >>> mm/memory.c | 110 ++++++++++++++++++++++++++----- >>> mm/page_io.c | 2 +- >>> mm/rmap.c | 5 +- >>> mm/swap_slots.c | 2 +- >>> mm/swapfile.c | 29 ++++++++ >>> 12 files changed, 301 insertions(+), 48 deletions(-) >>> >> > > Thanks > Barry