Re: [RFC 0/6] mm: support madvise(MADV_FREE)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Tue, Mar 18, 2014 at 10:55:24AM -0700, Andy Lutomirski wrote:
> On 03/13/2014 11:37 PM, Minchan Kim wrote:
> > This patch is an attempt to support MADV_FREE for Linux.
> > 
> > Rationale is following as.
> > 
> > Allocators call munmap(2) when user call free(3) if ptr is
> > in mmaped area. But munmap isn't cheap because it have to clean up
> > all pte entries, unlinking a vma and returns free pages to buddy
> > so overhead would be increased linearly by mmaped area's size.
> > So they like madvise_dontneed rather than munmap.
> > 
> > "dontneed" holds read-side lock of mmap_sem so other threads
> > of the process could go with concurrent page faults so it is
> > better than munmap if it's not lack of address space.
> > But the problem is that most of allocator reuses that address
> > space soonish so applications see page fault, page allocation,
> > page zeroing if allocator already called madvise_dontneed
> > on the address space.
> > 
> > For avoidng that overheads, other OS have supported MADV_FREE.
> > The idea is just mark pages as lazyfree when madvise called
> > and purge them if memory pressure happens. Otherwise, VM doesn't
> > detach pages on the address space so application could use
> > that memory space without above overheads.
> 
> I must be missing something.
> 
> If the application issues MADV_FREE and then writes to the MADV_FREEd
> range, the kernel needs to know that the pages are no longer safe to
> lazily free.  This would presumably happen via a page fault on write.
> For that to happen reliably, the kernel has to write protect the pages
> when MADV_FREE is called, which in turn requires flushing the TLBs.

It could be done by pte_dirty bit check. Of course, if some architectures
don't support it by H/W, pte_mkdirty would make it CoW as you said.
> 
> How does this end up being faster than munmap?

MADV_FREE doesn't need to return back the pages into page allocator
compared to MADV_DONTNEED and the overhead is not small when I measured
that on my machine.(Roughly, MADV_FREE's cost is half of DONTNEED through
avoiding involving page allocator.)

But I'd like to clarify that it's not MADV_FREE's goal that syscall
itself should be faster than MADV_DONTNEED but major goal is to
avoid unnecessary page fault + page allocation + page zeroing +
garbage swapout.

> 
> --Andy
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]