On Mon 13-02-23 10:01:35, David Hildenbrand wrote: > On 09.02.23 13:31, Jan Kara wrote: > > If the page is pinned, there's no point in trying to reclaim it. > > Furthermore if the page is from the page cache we don't want to reclaim > > fs-private data from the page because the pinning process may be writing > > to the page at any time and reclaiming fs private info on a dirty page > > can upset the filesystem (see link below). > > > > Link: https://lore.kernel.org/linux-mm/20180103100430.GE4911@xxxxxxxxxxxxxx > > Signed-off-by: Jan Kara <jack@xxxxxxx> > > --- > > mm/vmscan.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index bf3eedf0209c..ab3911a8b116 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1901,6 +1901,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > > } > > } > > + /* > > + * Folio is unmapped now so it cannot be newly pinned anymore. > > + * No point in trying to reclaim folio if it is pinned. > > + * Furthermore we don't want to reclaim underlying fs metadata > > + * if the folio is pinned and thus potentially modified by the > > + * pinning process is that may upset the filesystem. > > + */ > > + if (folio_maybe_dma_pinned(folio)) > > + goto activate_locked; > > + > > mapping = folio_mapping(folio); > > if (folio_test_dirty(folio)) { > > /* > > At this point, we made sure that the folio is completely unmapped. However, > we specify "TTU_BATCH_FLUSH", so rmap code might defer a TLB flush and > consequently defer an IPI sync. > > I remember that this check here is fine regarding GUP-fast: even if > concurrent GUP-fast pins the page after our check here, it should observe > the changed PTE and unpin it again. > > Checking after unmapping makes sense: we reduce the likelyhood of false > positives when a file-backed page is mapped many times (>= 1024). OTOH, we > might unmap pinned pages because we cannot really detect it early. > > For anon pages, we have an early (racy) check, which turned out "ok" in > practice, because we don't frequently have that many anon pages that are > shared by that many processes. I assume we don't want something similar for > pagecache pages, because having a single page mapped by many processes can > happen easily and would prevent reclaim. Yeah, I think pagecache pages shared by many processes are more likely. Furthermore I think pinned pagecache pages are rather rare so unmapping them before checking seems fine to me. Obviously we can reconsider if reality would prove me wrong ;). > I once had a patch lying around that documented for the existing > folio_maybe_dma_pinned() for anon pages exactly that (racy+false positives > with many mappings). > > Long story short, I assume this change is fine. Thanks for the throughout verification :) Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR