Re: [PATCH 01/13] mm/munlock: delete page_mlock() and all its works

Vlastimil Babka <vbabka@xxxxxxx> · Thu, 10 Feb 2022 10:52:31 +0100

On 2/9/22 23:28, Hugh Dickins wrote:
> On Wed, 9 Feb 2022, Vlastimil Babka wrote:
>> On 2/6/22 22:30, Hugh Dickins wrote:
> Thanks for taking a look, Vlastimil.  You make a good point here.
> 
> I had satisfied myself that no stage of the series was going to introduce
> boot failures or BUGs; and if someone is bisecting some mlock/munlock
> misbehaviour, I would not worry about which commit of the series they
> alight on, but root cause it keeping all the patches in mind.
> 
> But we certainly wouldn't want the series split up into separately
> submitted parts (that is, split anywhere between 01/13 and 07/13:
> splitting the rest apart wouldn't matter much); and it would be
> unfortunate if someone were bisecting some reclaim failure OOM problem
> elsewhere, and their test app happened to be using mlock, and their
> bisection landed part way between 01 and 07 here - the unimplemented
> munlock confusing the bisection for OOM.
> 
> The worst of it would be, I think, landing between 05 and 07: where
> your mlockall could mlock a variety of shared libraries etc, moving
> all their pages to unevictable, and those pagecache pages not getting
> moved back to evictable when unmapped.  I forget the current shrinker
> situation, whether inode pressure could evict that pagecache or not.
> 
> Two mitigations come to mind, let me think on it some more (and hope
> nobody's bisecting OOMs meanwhile): one might be to shift 05 (the one
> which replaces clear_page_inode() on last unmap by clearance when
> freeing) later in the series - its position was always somewhat
> arbitrary, but that position is where it's been tested; another might
> be to put nothing at all on the unevictable list in between 01 and 07.
> 
> Though taking this apart and putting it back together brings its own
> dangers.  That second suggestion probably won't fly very well, with
> 06/13 using mlock_count only while on the unevictable.  I'm not sure
> how much rethinking the bisection possibility deserves.

Right, if it's impractical to change for a potential and hopefully unlikely
bad bisection luck, just a note at the end of each patch's changelog
mentioning what temporarily doesn't work, should be enough.

>> Yet it differs from the existing "failure modes" where pages would be left
>> as "stranded" due to failure of being isolated, because they would at least
>> go through TestClearPageMlocked and counters update.
>> 
>> >  
>> >  /*
>> > @@ -413,75 +136,11 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>> >   *
>> >   * Returns with VM_LOCKED cleared.  Callers must be prepared to
>> >   * deal with this.
>> > - *
>> > - * We don't save and restore VM_LOCKED here because pages are
>> > - * still on lru.  In unmap path, pages might be scanned by reclaim
>> > - * and re-mlocked by page_mlock/try_to_unmap before we unmap and
>> > - * free them.  This will result in freeing mlocked pages.
>> >   */
>> >  void munlock_vma_pages_range(struct vm_area_struct *vma,
>> >  			     unsigned long start, unsigned long end)
>> >  {
>> > -	vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
>> 
>> Should we at least keep doing the flags clearing? I haven't check if there
>> are some VM_BUG_ONs that would trip on not cleared, but wouldn't be entirely
>> surprised.
> 
> There are two flags in question here, VM_LOCKED and VM_LOCKONFAULT:
> I'm not sure which of them you're particularly concerned about.

Well, either of those, but I said I didn't dig for possible consequences as
simply not removing line above looked simpler and matched the comment.

> As to VM_LOCKED: yes, you're right, at this stage of the series the
> munlock really ought to be clearing VM_LOCKED (even while it doesn't
> go on to do anything about the pages), as it claims in the comment above.
> I removed this line at a later stage (07/13), when changing it to
> mlock_vma_pages_range() serving both mlock and munlock according to
> whether VM_LOCKED is provided - and mistakenly folded back that deletion
> to this patch.  End result the same, but better to restore that maskout
> in this patch, as you suggest.

Great, thanks. That restores any effect on VM_LOCKONFAULT in any case as well.

> As to VM_LOCKONFAULT: I had checked the rest of mm/mlock.c, and the
> rest of the tree, and it only ever reached here along with VM_LOCKED;
> so when in 07/13 I switched over to "vma->vm_flags = newflags" (or
> WRITE_ONCE equivalent), I just didn't see the need to mask it out in
> the munlocking case; but could add a VM_BUG_ON that newflags never
> has it without VM_LOCKED, if you like. 
> 
> (You'll say VM_WARN_ON_ONCE, I'll say VM_BUG_ON because it never happens,
> then as soon as I put it in and run LTP or kselftests, I'll be ashamed
> to discover I've got it wrong, perhaps.)

Wasn't suggesting new VM_BUG_ONs just worried if the patch were breaking any
existing ones.

> Hugh