Re: [PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED

Mel Gorman <mgorman@xxxxxxx> · Mon, 21 Sep 2020 23:34:30 +0100

On Mon, Sep 21, 2020 at 09:43:17AM +0800, Yafang Shao wrote:
> Our users reported that there're some random latency spikes when their RT
> process is running. Finally we found that latency spike is caused by
> FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
> remote CPUs, and then waits the per-cpu work to complete. The wait time
> is uncertain, which may be tens millisecond.
> That behavior is unreasonable, because this process is bound to a
> specific CPU and the file is only accessed by itself, IOW, there should
> be no pagecache pages on a per-cpu pagevec of a remote CPU. That
> unreasonable behavior is partially caused by the wrong comparation of the
> number of invalidated pages and the number of the target. For example,
> 	if (count < (end_index - start_index + 1))
> The count above is how many pages were invalidated in the local CPU, and
> (end_index - start_index + 1) is how many pages should be invalidated.
> The usage of (end_index - start_index + 1) is incorrect, because they
> are virtual addresses, which may not mapped to pages. We'd better use
> inode->i_data.nrpages as the target.
> 

How does that work if the invalidation is for a subset of the file?

-- 
Mel Gorman
SUSE Labs