On Fri, Jul 7, 2023 at 11:01 PM Yin, Fengwei <fengwei.yin@xxxxxxxxx> wrote: > > > > On 7/8/2023 12:45 PM, Yu Zhao wrote: > > On Fri, Jul 7, 2023 at 10:52 AM Yin Fengwei <fengwei.yin@xxxxxxxxx> wrote: > >> > >> Yu mentioned at [1] about the mlock() can't be applied to large folio. > >> > >> I leant the related code and here is my understanding: > >> - For RLIMIT_MEMLOCK related, there is no problem. Becuase the > >> RLIMIT_MEMLOCK statistics is not related underneath page. That means > >> underneath page mlock or munlock doesn't impact the RLIMIT_MEMLOCK > >> statistics collection which is always correct. > >> > >> - For keeping the page in RAM, there is no problem either. At least, > >> during try_to_unmap_one(), once detect the VMA has VM_LOCKED bit > >> set in vm_flags, the folio will be kept whatever the folio is > >> mlocked or not. > >> > >> So the function of mlock for large folio works. But it's not optimized > >> because the page reclaim needs scan these large folio and may split > >> them. > >> > >> This series identified the large folio for mlock to two types: > >> - The large folio is in VM_LOCKED VMA range > >> - The large folio cross VM_LOCKED VMA boundary > >> > >> For the first type, we mlock large folio so page relcaim will skip it. > >> For the second type, we don't mlock large folio. It's allowed to be > >> picked by page reclaim and be split. So the pages not in VM_LOCKED VMA > >> range are allowed to be reclaimed/released. > > > > This is a sound design, which is also what I have in mind. I see the > > rationales are being spelled out in this thread, and hopefully > > everyone can be convinced. > > > >> patch1 introduce API to check whether large folio is in VMA range. > >> patch2 make page reclaim/mlock_vma_folio/munlock_vma_folio support > >> large folio mlock/munlock. > >> patch3 make mlock/munlock syscall support large folio. > > > > Could you tidy up the last patch a little bit? E.g., Saying "mlock the > > 4K folio" is obviously not the best idea. > > > > And if it's possible, make the loop just look like before, i.e., > > > > if (!can_mlock_entire_folio()) > > continue; > > if (vma->vm_flags & VM_LOCKED) > > mlock_folio_range(); > > else > > munlock_folio_range(); > This can make large folio mlocked() even user space call munlock() > to the range. Considering following case: > 1. mlock() 64K range and underneath 64K large folio is mlocked(). > 2. mprotect the first 32K range to different prot and triggers > VMA split. > 3. munlock() 64K range. As 64K large folio doesn't in these two > new VMAs range, it will not be munlocked() and only can be > reclaimed after it's unmapped from two VMAs instead of after > the range is munlocked(). I understand. I'm asking to factor the code, not to change the logic.