Re: [PATCH v2] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Sep 27, 2020 at 12:22:16PM +0800, Yafang Shao wrote:
> On Fri, Sep 25, 2020 at 10:40 PM Mel Gorman <mgorman@xxxxxxx> wrote:
> >
> > On Wed, Sep 23, 2020 at 09:33:18PM +0800, Yafang Shao wrote:
> > > Our users reported that there're some random latency spikes when their RT
> > > process is running. Finally we found that latency spike is caused by
> > > FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
> > > remote CPUs, and then waits the per-cpu work to complete. The wait time
> > > is uncertain, which may be tens millisecond.
> > > That behavior is unreasonable, because this process is bound to a
> > > specific CPU and the file is only accessed by itself, IOW, there should
> > > be no pagecache pages on a per-cpu pagevec of a remote CPU. That
> > > unreasonable behavior is partially caused by the wrong comparation of the
> > > number of invalidated pages and the number of the target. For example,
> > >         if (count < (end_index - start_index + 1))
> > > The count above is how many pages were invalidated in the local CPU, and
> > > (end_index - start_index + 1) is how many pages should be invalidated.
> > > The usage of (end_index - start_index + 1) is incorrect, because they
> > > are virtual addresses, which may not mapped to pages. Besides that,
> > > there may be holes between start and end. So we'd better check whether
> > > there are still pages on per-cpu pagevec after drain the local cpu, and
> > > then decide whether or not to call lru_add_drain_all().
> > >
> > > After I applied it with a hotfix to our production environment, most of
> > > the lru_add_drain_all() can be avoided.
> > >
> > > Suggested-by: Mel Gorman <mgorman@xxxxxxx>
> > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> > > Cc: Mel Gorman <mgorman@xxxxxxx>
> > > Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> >
> > I think that's ok. Does it succeed with the original test case from the
> > commit that introduced the behaviour and one modified to truncate part
> > of the mapping?
> >
> 
> Yes, I verified the test case in commit 67d46b296a1b and then modified
> it with truncate.
> Both works fine.
> 

In that case

Acked-by: Mel Gorman <mgorman@xxxxxxx>

-- 
Mel Gorman
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux