On Mon, Apr 12, 2010 at 09:08:11AM +0200, Andrea Arcangeli wrote: > On Mon, Apr 12, 2010 at 04:09:31PM +1000, Nick Piggin wrote: > > One problem is that you need to keep a lot more memory free in order > > for it to be reasonably effective. Another thing is that the problem > > of fragmentation breakdown is not just a one-shot event that fills > > memory with pinned objects. It is a slow degredation. > > set_recommended_min_free_kbytes seems to not be in function of ram > size, 60MB aren't such a big deal. > > > Especially when you use something like SLUB as the memory allocator > > which requires higher order allocations for objects which are pinned > > in kernel memory. > > > > Just running a few minutes of testing with a kernel compile in the > > background does not show the full picture. You really need a box that > > has been up for days running a proper workload before you are likely > > to see any breakdown. > > > > I'm sure it's horrible for planning if the RDBMS or VM boxes gradually > > get slower after X days of uptime. It's better to have consistent > > performance really, for anything except pure benchmark setups. > > All data I provided is very real, in addition to building a ton of > packages and running emerge on /usr/portage I've been running all my > real loads. Only problem I only run it for 1 day and half, but the > load I kept it under was significant (surely a lot bigger inode/dentry > load that any hypervisor usage would ever generate). OK, but as a solution for some kind of very specific and highly optimized application already like RDBMS, HPC, hypervisor or JVM, they could just be using hugepages themselves, couldn't they? It seems more interesting as a more general speedup for applications that can't afford such optimizations? (eg. the common case for most people) > > Defrag is not futile in theory, you just have to either have a reserve > > of movable pages (and never allow pinned kernel pages in there), or > > you need to allocate pinned kernel memory in units of the chunk size > > goal (which just gives you different types of fragmentation problems) > > or you need to do non-linear kernel mappings so you can defrag pinned > > kernel memory (with *lots* of other problems of course). So you just > > have a lot of downsides. > > That's what the kernelcore= option does no? Isn't that a good enough > math guarantee? Probably we should use it in hypervisor products just > in case, to be math-guaranted to never have to use VM migration as > fallback (but definitive) defrag algorithm. Yes we do have the option to reserve pages and as far as I know it should work, although I can't remember whether it deals with mlock. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>