Hello, On Thu, Jan 03, 2013 at 09:19:08AM -0800, Sanjay Ghemawat wrote: > On Wed, Jan 2, 2013 at 8:27 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote: > > This is still RFC because we need more input from user-space > > people, more stress test, design discussion about interface/reclaim > > Speaking as one of the authors of tcmalloc, I don't see any particular > need for this new system call for tcmalloc. We are fine using > madvise(MADV_DONTNEED) and don't notice any significant > performance issues caused by it. Background: we throttle how > quickly we release memory back to the system (1-10MB/s), so > we do not call madvise() very much, and we don't end up reusing > madvise-ed away pages at a fast rate. My guess is that we won't It means TCmalloc controls madvise's rate dynamically without user's intervention? Smart TCmalloc! Let me ask some questions. What is your policy for control of throttling of madvise? I guess policy is following as. The madvise's frequent calling is bad because pte zap overhead of madvise + next page fault/memset + page access bit emulatation page fault in some architecture like ARM when reused the range. So we should call it fast rate only when memory pressure happens very carefully. Is it similar to your throttling logic? If my assumption isn't totally wrong, how could a process know the memory pressure at the moment by just per-process view, NOT system view? If your logic takes some mistake, (for instace, memory pressure is severe but it doesn't call madvise) working set could be reclaimed like file-backed pages, which could minimize your benefit via madvise throttling. I guess it's very fragile. It's more severe in embedded world because they don't use swap so system encounters OOM instead of swappout. In this point, mvolatile's concept is light weight system call by just mark the flag in the vma and auto free when system suffers from memory pressure(about this, my plan is zap all pages if kswapd is active when movlatile system call is called) by reclaimer with preventing working set page eviction, otherwise enhance application's speed with removing (minor fault + page allocation + memset). Also, it would make allocator simple through removing control logic, which is less error-prone and even might make smart TCmalloc better than now althoug it doesn't have any significat performance issue. > see large enough application-level performance improvements to > cause us to change tcmalloc to use this system call. > > > - What's different with madvise(DONTNEED)? > > > > System call semantic > > > > DONTNEED makes sure user always can see zero-fill pages after > > he calls madvise while mvolatile can see old data or encounter > > SIGBUS. > > Do you need a new system call for this? Why not just a new flag to madvise > with weaker guarantees than zero-filling? All of the implementation changes > you point out below could be triggered from that flag. Agreed and actually, I tried it but changed my mind because it required adding many hacky codes in madvise due to return value and error's semantic is totally different with normal madvise and needs three flags at least at the moment but not sure we need more flags during discussion. MADV_VOLATILE, MADV_NOVOLATILE, MADV_[NO]VOLATILE|MADV_PARTIAL_DISCARD I don't want to make madvise dirty and consume lots of new flags of madvise for a volatile feature. But if everybody want to fold into madivse, I can do it, too. Thanks for the feedback, Sanjay! > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>