Re: [RFC PATCH 0/3] support large folio for mlock

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Sat, 8 Jul 2023 05:02:09 +0100

On Sat, Jul 08, 2023 at 11:52:23AM +0800, Yin, Fengwei wrote:
> > Oh, I agree, there are always going to be circumstances where we realise
> > we've made a bad decision and can't (easily) undo it.  Unless we have a
> > per-page pincount, and I Would Rather Not Do That.  But we should _try_
> > to do that because it's the right model -- that's what I meant by "Tell
> > me why I'm wrong"; what scenarios do we have where a user temporarilly
> > mlocks (or mprotects or ...) a range of memory, but wants that memory
> > to be aged in the LRU exactly the same way as the adjacent memory that
> > wasn't mprotected?
> for manpage of mlock():
>        mlock(),  mlock2(), and mlockall() lock part or all of the calling process's virtual address space into RAM, preventing that memory
>        from being paged to the swap area.
> 
> So my understanding is it's OK to let the memory mlocked to be aged with
> the adjacent memory which is not mlocked. Just make sure they are not
> paged out to swap.

Right, it doesn't break anything; it's just a similar problem to
internal fragmentation.  The pages of the folio which aren't mlocked
will also be locked in RAM and never paged out.

> One question for implementation detail:
>   If the large folio cross VMA boundary can not be split, how do we
>   deal with this case? Retry in syscall till it's split successfully?
>   Or return error (and what ERRORS should we choose) to user space?

I would be tempted to allocate memory & copy to the new mlocked VMA.
The old folio will go on the deferred_list and be split later, or its
valid parts will be written to swap and then it can be freed.