Re: [patch]mm: make madvise(MADV_WILLNEED) support swap file prefetch

Shaohua Li <shli@xxxxxxxxxx> · Tue, 8 Jan 2013 17:13:24 +0800

On Tue, Jan 08, 2013 at 05:38:53PM +0900, Minchan Kim wrote:
> On Tue, Jan 08, 2013 at 03:32:29PM +0800, Shaohua Li wrote:
> > On Tue, Jan 08, 2013 at 02:38:56PM +0900, Minchan Kim wrote:
> > > Hi Shaohua,
> > > 
> > > On Tue, Jan 08, 2013 at 12:26:09PM +0800, Shaohua Li wrote:
> > > > On Tue, Jan 08, 2013 at 10:16:07AM +0800, Wanpeng Li wrote:
> > > > > On Mon, Jan 07, 2013 at 12:06:30PM -0800, Andrew Morton wrote:
> > > > > >On Mon, 7 Jan 2013 16:12:37 +0800
> > > > > >Shaohua Li <shli@xxxxxxxxxx> wrote:
> > > > > >
> > > > > >> 
> > > > > >> Make madvise(MADV_WILLNEED) support swap file prefetch. If memory is swapout,
> > > > > >> this syscall can do swapin prefetch. It has no impact if the memory isn't
> > > > > >> swapout.
> > > > > >
> > > > > >Seems sensible.
> > > > > 
> > > > > Hi Andrew and Shaohua,
> > > > > 
> > > > > What's the performance in the scenario of serious memory pressure? Since
> > > > > in this case pages in swap are highly fragmented and cache hit is most
> > > > > impossible. If WILLNEED path should add a check to skip readahead in
> > > > > this case since swapin only leads to unnecessary memory allocation. 
> > > > 
> > > > pages in swap are not highly fragmented if you access memory sequentially. In
> > > > that case, the pages you accessed will be added to lru list side by side. So if
> > > > app does swap prefetch, we can do sequential disk access and merge small
> > > > request to big one.
> > > 
> > > How can you make sure that the range of WILLNEED was always sequentially accesssed?
> > 
> > you can't guarantee this even for file access.
> 
> Indeed.
> 
> > 
> > > > Another advantage is prefetch can drive high disk iodepth.  For sequential
> > > 
> > > What does it mean 'iodepth'? I failed to grep it in google. :(
> > 
> > io depth. How many requests are inflight at a givin time.
> 
> Thanks for the info!
> 
> > 
> > > > access, this can cause big request. Even for random access, high iodepth has
> > > > much better performance especially for SSD.
> > > 
> > > So you mean WILLNEED is always good in where both random and sequential in "SSD"?
> > > Then, how about the "Disk"?
> > 
> > Hmm, even for hard disk, high iodepth random access is faster than single
> > iodepth access. Today's disk is NCQ disk. But the speedup isn't that
> > significant like a SSD. For sequential access, both harddisk and SSD have
> > better performance with higher iodepth.
> > 
> > > Wanpeng's comment makes sense to me so I guess others can have a same question
> > > about this patch. So it would be better to write your rationale in changelog.
> > 
> > I would, but the question is just like why app wants to prefetch file pages. I
> > thought it's commonsense. The problem like memory allocation exists in file
> > prefetch too. The advantages (better IO access, CPU and disk can operate in
> > parallel and so on) apply for both file and swap prefetch.
> 
> Agreed. But I have a question about semantic of madvise(DONTNEED) of anon vma.
> If Linux start to support it for anon, user can misunderstand it following as.
> 
> User might think we start to use anonymous pages in that range soon so he
> gives the hint to kernel to map all pages of the range to page table in advance.
> (ie, pre page fault like MAP_POPULATE) and if one of the page might be
> swapped out, readahead it. What do you think about it?
> For clarification, it would be better to add man page description with Ccing
> man page maintainer.

there is no confusion if the page exists or swapped. I thought what you are are
thinking about is the page isn't populated yet. The manpage declaims WILLNEED
"it might be a good idea to read some pages ahead." This sounds clear this
isn't to populate memory and matches what we did. But I'm not sure what's the
precise description.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>