On 2/9/22 23:28, Hugh Dickins wrote: > On Wed, 9 Feb 2022, Vlastimil Babka wrote: >> On 2/6/22 22:30, Hugh Dickins wrote: > Thanks for taking a look, Vlastimil. You make a good point here. > > I had satisfied myself that no stage of the series was going to introduce > boot failures or BUGs; and if someone is bisecting some mlock/munlock > misbehaviour, I would not worry about which commit of the series they > alight on, but root cause it keeping all the patches in mind. > > But we certainly wouldn't want the series split up into separately > submitted parts (that is, split anywhere between 01/13 and 07/13: > splitting the rest apart wouldn't matter much); and it would be > unfortunate if someone were bisecting some reclaim failure OOM problem > elsewhere, and their test app happened to be using mlock, and their > bisection landed part way between 01 and 07 here - the unimplemented > munlock confusing the bisection for OOM. > > The worst of it would be, I think, landing between 05 and 07: where > your mlockall could mlock a variety of shared libraries etc, moving > all their pages to unevictable, and those pagecache pages not getting > moved back to evictable when unmapped. I forget the current shrinker > situation, whether inode pressure could evict that pagecache or not. > > Two mitigations come to mind, let me think on it some more (and hope > nobody's bisecting OOMs meanwhile): one might be to shift 05 (the one > which replaces clear_page_inode() on last unmap by clearance when > freeing) later in the series - its position was always somewhat > arbitrary, but that position is where it's been tested; another might > be to put nothing at all on the unevictable list in between 01 and 07. > > Though taking this apart and putting it back together brings its own > dangers. That second suggestion probably won't fly very well, with > 06/13 using mlock_count only while on the unevictable. I'm not sure > how much rethinking the bisection possibility deserves. Right, if it's impractical to change for a potential and hopefully unlikely bad bisection luck, just a note at the end of each patch's changelog mentioning what temporarily doesn't work, should be enough. >> Yet it differs from the existing "failure modes" where pages would be left >> as "stranded" due to failure of being isolated, because they would at least >> go through TestClearPageMlocked and counters update. >> >> > >> > /* >> > @@ -413,75 +136,11 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec, >> > * >> > * Returns with VM_LOCKED cleared. Callers must be prepared to >> > * deal with this. >> > - * >> > - * We don't save and restore VM_LOCKED here because pages are >> > - * still on lru. In unmap path, pages might be scanned by reclaim >> > - * and re-mlocked by page_mlock/try_to_unmap before we unmap and >> > - * free them. This will result in freeing mlocked pages. >> > */ >> > void munlock_vma_pages_range(struct vm_area_struct *vma, >> > unsigned long start, unsigned long end) >> > { >> > - vma->vm_flags &= VM_LOCKED_CLEAR_MASK; >> >> Should we at least keep doing the flags clearing? I haven't check if there >> are some VM_BUG_ONs that would trip on not cleared, but wouldn't be entirely >> surprised. > > There are two flags in question here, VM_LOCKED and VM_LOCKONFAULT: > I'm not sure which of them you're particularly concerned about. Well, either of those, but I said I didn't dig for possible consequences as simply not removing line above looked simpler and matched the comment. > As to VM_LOCKED: yes, you're right, at this stage of the series the > munlock really ought to be clearing VM_LOCKED (even while it doesn't > go on to do anything about the pages), as it claims in the comment above. > I removed this line at a later stage (07/13), when changing it to > mlock_vma_pages_range() serving both mlock and munlock according to > whether VM_LOCKED is provided - and mistakenly folded back that deletion > to this patch. End result the same, but better to restore that maskout > in this patch, as you suggest. Great, thanks. That restores any effect on VM_LOCKONFAULT in any case as well. > As to VM_LOCKONFAULT: I had checked the rest of mm/mlock.c, and the > rest of the tree, and it only ever reached here along with VM_LOCKED; > so when in 07/13 I switched over to "vma->vm_flags = newflags" (or > WRITE_ONCE equivalent), I just didn't see the need to mask it out in > the munlocking case; but could add a VM_BUG_ON that newflags never > has it without VM_LOCKED, if you like. > > (You'll say VM_WARN_ON_ONCE, I'll say VM_BUG_ON because it never happens, > then as soon as I put it in and run LTP or kselftests, I'll be ashamed > to discover I've got it wrong, perhaps.) Wasn't suggesting new VM_BUG_ONs just worried if the patch were breaking any existing ones. > Hugh