On 2020-06-24 12:21, Jason Gunthorpe wrote:
On Wed, Jun 24, 2020 at 08:14:17PM +0100, Chris Wilson wrote:
A general rule of thumb is that shrinkers should be fast and effective.
They are called from direct reclaim at the most incovenient of times when
the caller is waiting for a page. If we attempt to reclaim a page being
pinned for active dma [pin_user_pages()], we will incur far greater
latency than a normal anonymous page mapped multiple times. Worse the
page may be in use indefinitely by the HW and unable to be reclaimed
in a timely manner.
A pinned page can't be migrated, discarded or swapped by definition -
it would cause data corruption.
So, how do things even get here and/or work today at all? I think the
explanation is missing something important.
Well, those activities generally try to unmap the page, and
have to be prepared to deal with failure to unmap. From my reading,
it seemed very clear.
What's less clear is why the comment and the commit description
only talk about reclaim, when there are additional things that call
try_to_unmap(), including:
migrate_vma_unmap()
split_huge_page_to_list() --> unmap_page()
I do like this code change, though. And I *think* it's actually safe to
do this, as it stays away from writeback or other filesystem activity.
But let me double check that, in case I'm forgetting something.
thanks,
--
John Hubbard
NVIDIA