[PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED

Yafang Shao <laoar.shao@xxxxxxxxx> · Mon, 21 Sep 2020 09:43:17 +0800

Our users reported that there're some random latency spikes when their RT
process is running. Finally we found that latency spike is caused by
FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
remote CPUs, and then waits the per-cpu work to complete. The wait time
is uncertain, which may be tens millisecond.
That behavior is unreasonable, because this process is bound to a
specific CPU and the file is only accessed by itself, IOW, there should
be no pagecache pages on a per-cpu pagevec of a remote CPU. That
unreasonable behavior is partially caused by the wrong comparation of the
number of invalidated pages and the number of the target. For example,
	if (count < (end_index - start_index + 1))
The count above is how many pages were invalidated in the local CPU, and
(end_index - start_index + 1) is how many pages should be invalidated.
The usage of (end_index - start_index + 1) is incorrect, because they
are virtual addresses, which may not mapped to pages. We'd better use
inode->i_data.nrpages as the target.

Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
---
 mm/fadvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/fadvise.c b/mm/fadvise.c
index 0e66f2aaeea3..ec25c91194a3 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -163,7 +163,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 			 * a per-cpu pagevec for a remote CPU. Drain all
 			 * pagevecs and try again.
 			 */
-			if (count < (end_index - start_index + 1)) {
+			if (count < inode->i_data.nrpages) {
 				lru_add_drain_all();
 				invalidate_mapping_pages(mapping, start_index,
 						end_index);
-- 
2.17.1