Re: [PATCH 1/5] mm: Do not reclaim private data from pinned page

Jan Kara <jack@xxxxxxx> · Tue, 14 Feb 2023 14:00:00 +0100

On Mon 13-02-23 10:01:35, David Hildenbrand wrote:
> On 09.02.23 13:31, Jan Kara wrote:
> > If the page is pinned, there's no point in trying to reclaim it.
> > Furthermore if the page is from the page cache we don't want to reclaim
> > fs-private data from the page because the pinning process may be writing
> > to the page at any time and reclaiming fs private info on a dirty page
> > can upset the filesystem (see link below).
> > 
> > Link: https://lore.kernel.org/linux-mm/20180103100430.GE4911@xxxxxxxxxxxxxx
> > Signed-off-by: Jan Kara <jack@xxxxxxx>
> > ---
> >   mm/vmscan.c | 10 ++++++++++
> >   1 file changed, 10 insertions(+)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index bf3eedf0209c..ab3911a8b116 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1901,6 +1901,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >   			}
> >   		}
> > +		/*
> > +		 * Folio is unmapped now so it cannot be newly pinned anymore.
> > +		 * No point in trying to reclaim folio if it is pinned.
> > +		 * Furthermore we don't want to reclaim underlying fs metadata
> > +		 * if the folio is pinned and thus potentially modified by the
> > +		 * pinning process is that may upset the filesystem.
> > +		 */
> > +		if (folio_maybe_dma_pinned(folio))
> > +			goto activate_locked;
> > +
> >   		mapping = folio_mapping(folio);
> >   		if (folio_test_dirty(folio)) {
> >   			/*
> 
> At this point, we made sure that the folio is completely unmapped. However,
> we specify "TTU_BATCH_FLUSH", so rmap code might defer a TLB flush and
> consequently defer an IPI sync.
> 
> I remember that this check here is fine regarding GUP-fast: even if
> concurrent GUP-fast pins the page after our check here, it should observe
> the changed PTE and unpin it again.
>  
> Checking after unmapping makes sense: we reduce the likelyhood of false
> positives when a file-backed page is mapped many times (>= 1024). OTOH, we
> might unmap pinned pages because we cannot really detect it early.
> 
> For anon pages, we have an early (racy) check, which turned out "ok" in
> practice, because we don't frequently have that many anon pages that are
> shared by that many processes. I assume we don't want something similar for
> pagecache pages, because having a single page mapped by many processes can
> happen easily and would prevent reclaim.

Yeah, I think pagecache pages shared by many processes are more likely.
Furthermore I think pinned pagecache pages are rather rare so unmapping
them before checking seems fine to me. Obviously we can reconsider if
reality would prove me wrong ;).

> I once had a patch lying around that documented for the existing
> folio_maybe_dma_pinned() for anon pages exactly that (racy+false positives
> with many mappings).
> 
> Long story short, I assume this change is fine.

Thanks for the throughout verification :)

								Honza

-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR