Re: [RFC 3/6] mm: support madvise(MADV_FREE)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Hannes,

On Tue, Mar 18, 2014 at 02:26:21PM -0400, Johannes Weiner wrote:
> On Fri, Mar 14, 2014 at 03:37:47PM +0900, Minchan Kim wrote:
> > Linux doesn't have an ability to free pages lazy while other OS
> > already have been supported that named by madvise(MADV_FREE).
> > 
> > The gain is clear that kernel can evict freed pages rather than
> > swapping out or OOM if memory pressure happens.
> > 
> > Without memory pressure, freed pages would be reused by userspace
> > without another additional overhead(ex, page fault + + page allocation
> > + page zeroing).
> > 
> > Firstly, heavy users would be general allocators(ex, jemalloc,
> > I hope ptmalloc support it) and jemalloc already have supported
> > the feature for other OS(ex, FreeBSD)
> > 
> > At the moment, this patch would break build other ARCHs which have
> > own TLB flush scheme other than that x86 but if there is no objection
> > in this direction, I will add patches for handling other ARCHs
> > in next iteration.
> > 
> > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
> 
> > @@ -284,8 +286,17 @@ static long madvise_dontneed(struct vm_area_struct *vma,
> >  			.last_index = ULONG_MAX,
> >  		};
> >  		zap_page_range(vma, start, end - start, &details);
> > +	} else if (behavior == MADV_FREE) {
> > +		struct zap_details details = {
> > +			.lazy_free = 1,
> > +		};
> > +
> > +		if (vma->vm_file)
> > +			return -EINVAL;
> > +		zap_page_range(vma, start, end - start, &details);
> 
> Wouldn't a custom page table walker to clear dirty bits and move pages
> be better?  It's awkward to hook this into the freeing code and then
> special case the pages and not actually free them.

NP.

> 
> > @@ -817,6 +817,25 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  
> >  		sc->nr_scanned++;
> >  
> > +		if (PageLazyFree(page)) {
> > +			switch (try_to_unmap(page, ttu_flags)) {
> 
> I don't get why we need a page flag for this.  page_check_references()
> could use the rmap walk to also check if any pte/pmd is dirty.  If so,
> you have to swap the page.  If all are clean, it can be discarded.

Ugh, you're right. I guess it could work.
I will look into that in next iteration.

Thanks!

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]