Re: [PATCH] mm/gup: don't check page lru flag before draining it

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2024/6/6 下午3:56, David Hildenbrand 写道:
Some random thoughts about some folio_test_lru() users:

mm/khugepaged.c: skips pages if !folio_test_lru(), but would fail skip
it either way if there is the unexpected reference from the LRU batch!

mm/compaction.c: skips pages if !folio_test_lru(), but would fail skip
it either way if there is the unexpected reference from the LRU batch!

mm/memory.c: would love to identify this case and to a lru_add_drain()
to free up that reference.

mm/huge_memory.c: splitting with the additional reference will fail
already. Maybe we'd want to drain the LRU batch.

Agree.


mm/madvise.c: skips pages if !folio_test_lru(). I wonder what happens if
we have the same page twice in an LRU batch with different target goals ...

IIUC, LRU batch can ignore this folio since it's LRU flag is cleared by
folio_isolate_lru(), then will call folios_put() to frop the reference.


I think what's interesting to highlight in the current design is that a folio might end up in multiple LRU batches, and whatever the result will be is determined by the sequence of them getting flushed. Doesn't sound quite right but maybe there was a reason for it (which could just have been "simpler implementation").


Some other users (there are not that many that don't use it for sanity
checks though) might likely be a bit different.

There are also some PageLRU checks, but not that many.


mm/page_isolation.c: fail to set pageblock migratetype to isolate if
!folio_test_lru(), then alloc_contig_range_noprof() can be failed. But
the original code could set pageblock migratetype to isolate, then
calling drain_all_pages() in alloc_contig_range_noprof() to drop
reference of the LRU batch.

mm/vmscan.c: will call lru_add_drain() before calling
isolate_lru_folios(), so seems no impact.

lru_add_drain() will only drain the local CPU. So if the folio would be stuck on another CPU's LRU batch, right now we could isolate it. When processing that LRU batch while the folio is still isolated, it would currently simply skip the operation.

So right now we can call isolate_lru_folios() even if the folio is stuck on another CPU's LRU batch.

We cannot really reclaim the folio as long is it is in another CPU's LRU batch, though (unexpected reference).


BTW, we also need to look at the usage of folio_isolate_lru().

Yes.


It doesn’t seem to have major obstacles, but there are many details to
analyze :)

Yes, we're only scratching the surface.

Having a way to identify "this folio is very likely some CPU's LRU batch"  could end up being quite valuable, because likely we don't want to blindly drain the LRU simply because there is some unexpected reference on a folio [as we would in this patch].


Can we add a PG_lru_batch flag to determine whether a page is in lru batch? If we can, seems this problem will be easier.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux