Re: [PATCH 00 of 41] Transparent Hugepage Support #17

Nick Piggin <npiggin@xxxxxxx> · Mon, 12 Apr 2010 17:21:44 +1000

On Mon, Apr 12, 2010 at 09:08:11AM +0200, Andrea Arcangeli wrote:
> On Mon, Apr 12, 2010 at 04:09:31PM +1000, Nick Piggin wrote:
> > One problem is that you need to keep a lot more memory free in order
> > for it to be reasonably effective. Another thing is that the problem
> > of fragmentation breakdown is not just a one-shot event that fills
> > memory with pinned objects. It is a slow degredation.
> 
> set_recommended_min_free_kbytes seems to not be in function of ram
> size, 60MB aren't such a big deal.
> 
> > Especially when you use something like SLUB as the memory allocator
> > which requires higher order allocations for objects which are pinned
> > in kernel memory.
> > 
> > Just running a few minutes of testing with a kernel compile in the
> > background does not show the full picture. You really need a box that
> > has been up for days running a proper workload before you are likely
> > to see any breakdown.
> > 
> > I'm sure it's horrible for planning if the RDBMS or VM boxes gradually
> > get slower after X days of uptime. It's better to have consistent
> > performance really, for anything except pure benchmark setups.
> 
> All data I provided is very real, in addition to building a ton of
> packages and running emerge on /usr/portage I've been running all my
> real loads. Only problem I only run it for 1 day and half, but the
> load I kept it under was significant (surely a lot bigger inode/dentry
> load that any hypervisor usage would ever generate).

OK, but as a solution for some kind of very specific and highly
optimized application already like RDBMS, HPC, hypervisor or JVM,
they could just be using hugepages themselves, couldn't they?

It seems more interesting as a more general speedup for applications
that can't afford such optimizations? (eg. the common case for
most people)

> > Defrag is not futile in theory, you just have to either have a reserve
> > of movable pages (and never allow pinned kernel pages in there), or
> > you need to allocate pinned kernel memory in units of the chunk size
> > goal (which just gives you different types of fragmentation problems)
> > or you need to do non-linear kernel mappings so you can defrag pinned
> > kernel memory (with *lots* of other problems of course). So you just
> > have a lot of downsides.
> 
> That's what the kernelcore= option does no? Isn't that a good enough
> math guarantee? Probably we should use it in hypervisor products just
> in case, to be math-guaranted to never have to use VM migration as
> fallback (but definitive) defrag algorithm.

Yes we do have the option to reserve pages and as far as I know it
should work, although I can't remember whether it deals with mlock.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>