On 7/8/2023 1:06 PM, Yu Zhao wrote: > On Fri, Jul 7, 2023 at 11:01 PM Yin, Fengwei <fengwei.yin@xxxxxxxxx> wrote: >> >> >> >> On 7/8/2023 12:45 PM, Yu Zhao wrote: >>> On Fri, Jul 7, 2023 at 10:52 AM Yin Fengwei <fengwei.yin@xxxxxxxxx> wrote: >>>> >>>> Yu mentioned at [1] about the mlock() can't be applied to large folio. >>>> >>>> I leant the related code and here is my understanding: >>>> - For RLIMIT_MEMLOCK related, there is no problem. Becuase the >>>> RLIMIT_MEMLOCK statistics is not related underneath page. That means >>>> underneath page mlock or munlock doesn't impact the RLIMIT_MEMLOCK >>>> statistics collection which is always correct. >>>> >>>> - For keeping the page in RAM, there is no problem either. At least, >>>> during try_to_unmap_one(), once detect the VMA has VM_LOCKED bit >>>> set in vm_flags, the folio will be kept whatever the folio is >>>> mlocked or not. >>>> >>>> So the function of mlock for large folio works. But it's not optimized >>>> because the page reclaim needs scan these large folio and may split >>>> them. >>>> >>>> This series identified the large folio for mlock to two types: >>>> - The large folio is in VM_LOCKED VMA range >>>> - The large folio cross VM_LOCKED VMA boundary >>>> >>>> For the first type, we mlock large folio so page relcaim will skip it. >>>> For the second type, we don't mlock large folio. It's allowed to be >>>> picked by page reclaim and be split. So the pages not in VM_LOCKED VMA >>>> range are allowed to be reclaimed/released. >>> >>> This is a sound design, which is also what I have in mind. I see the >>> rationales are being spelled out in this thread, and hopefully >>> everyone can be convinced. >>> >>>> patch1 introduce API to check whether large folio is in VMA range. >>>> patch2 make page reclaim/mlock_vma_folio/munlock_vma_folio support >>>> large folio mlock/munlock. >>>> patch3 make mlock/munlock syscall support large folio. >>> >>> Could you tidy up the last patch a little bit? E.g., Saying "mlock the >>> 4K folio" is obviously not the best idea. >>> >>> And if it's possible, make the loop just look like before, i.e., >>> >>> if (!can_mlock_entire_folio()) >>> continue; >>> if (vma->vm_flags & VM_LOCKED) >>> mlock_folio_range(); >>> else >>> munlock_folio_range(); >> This can make large folio mlocked() even user space call munlock() >> to the range. Considering following case: >> 1. mlock() 64K range and underneath 64K large folio is mlocked(). >> 2. mprotect the first 32K range to different prot and triggers >> VMA split. >> 3. munlock() 64K range. As 64K large folio doesn't in these two >> new VMAs range, it will not be munlocked() and only can be >> reclaimed after it's unmapped from two VMAs instead of after >> the range is munlocked(). > > I understand. I'm asking to factor the code, not to change the logic. Oh. Sorry. I miss-understood the code piece you showed. Will address this in coming version. Thanks. Regards Yin, Fengwei