On Fri, Jul 7, 2023 at 11:34 PM Yin, Fengwei <fengwei.yin@xxxxxxxxx> wrote: > > > > On 7/8/2023 1:11 PM, Yu Zhao wrote: > > On Fri, Jul 7, 2023 at 10:52 AM Yin Fengwei <fengwei.yin@xxxxxxxxx> wrote: > >> > >> If large folio is in the range of VM_LOCKED VMA, it should be > >> mlocked to avoid being picked by page reclaim. Which may split > >> the large folio and then mlock each pages again. > >> > >> Mlock this kind of large folio to prevent them being picked by > >> page reclaim. > >> > >> For the large folio which cross the boundary of VM_LOCKED VMA, > >> we'd better not to mlock it. So if the system is under memory > >> pressure, this kind of large folio will be split and the pages > >> ouf of VM_LOCKED VMA can be reclaimed. > >> > >> Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> > >> --- > >> mm/internal.h | 11 ++++++++--- > >> mm/rmap.c | 3 ++- > >> 2 files changed, 10 insertions(+), 4 deletions(-) > >> > >> diff --git a/mm/internal.h b/mm/internal.h > >> index 66117523d7d71..c7b8f0b008d81 100644 > >> --- a/mm/internal.h > >> +++ b/mm/internal.h > >> @@ -637,7 +637,8 @@ static inline void mlock_vma_folio(struct folio *folio, > >> * still be set while VM_SPECIAL bits are added: so ignore it then. > >> */ > >> if (unlikely((vma->vm_flags & (VM_LOCKED|VM_SPECIAL)) == VM_LOCKED) && > >> - (compound || !folio_test_large(folio))) > >> + (compound || !folio_test_large(folio) || > >> + folio_in_range(folio, vma, vma->vm_start, vma->vm_end))) > >> mlock_folio(folio); > >> } > >> > >> @@ -645,8 +646,12 @@ void munlock_folio(struct folio *folio); > >> static inline void munlock_vma_folio(struct folio *folio, > >> struct vm_area_struct *vma, bool compound) > >> { > >> - if (unlikely(vma->vm_flags & VM_LOCKED) && > >> - (compound || !folio_test_large(folio))) > >> + /* > >> + * To handle the case that a mlocked large folio is unmapped from VMA > >> + * piece by piece, allow munlock the large folio which is partially > >> + * mapped to VMA. > >> + */ > >> + if (unlikely(vma->vm_flags & VM_LOCKED)) > >> munlock_folio(folio); > >> } > >> > >> diff --git a/mm/rmap.c b/mm/rmap.c > >> index 2668f5ea35342..7d6547d1bd096 100644 > >> --- a/mm/rmap.c > >> +++ b/mm/rmap.c > >> @@ -817,7 +817,8 @@ static bool folio_referenced_one(struct folio *folio, > >> address = pvmw.address; > >> > >> if ((vma->vm_flags & VM_LOCKED) && > >> - (!folio_test_large(folio) || !pvmw.pte)) { > >> + (!folio_test_large(folio) || !pvmw.pte || > >> + folio_in_range(folio, vma, vma->vm_start, vma->vm_end))) { > >> /* Restore the mlock which got missed */ > >> mlock_vma_folio(folio, vma, !pvmw.pte); > >> page_vma_mapped_walk_done(&pvmw); > > > > It needs to bail out if large but not within range so that the > > references within the locked VMA can be ignored. Otherwise, a hot > > locked portion can prevent a cold unlocked portion from getting > > reclaimed. > Good point. We can't bail out here as return here means folio should > not be reclaimed. My understanding is that we should skip the entries > which is in the range of VM_LOCKED VMA. Will address this in coming > version. Thanks. Yes, that's what I mean. A wrapper would be cleaner: while () { ... if (vma->vm_flags & VM_LOCKED) { if (cant_mlock()) goto next; ... return false; } ... next: pra->mapcount--; }