Re: [PATCH 00 of 41] Transparent Hugepage Support #17

Ingo Molnar <mingo@xxxxxxx> · Mon, 12 Apr 2010 10:07:48 +0200

* Avi Kivity <avi@xxxxxxxxxx> wrote:

> On 04/12/2010 10:21 AM, Nick Piggin wrote:
> >>
> >>All data I provided is very real, in addition to building a ton of
> >>packages and running emerge on /usr/portage I've been running all my
> >>real loads. Only problem I only run it for 1 day and half, but the
> >>load I kept it under was significant (surely a lot bigger inode/dentry
> >>load that any hypervisor usage would ever generate).
> >OK, but as a solution for some kind of very specific and highly
> >optimized application already like RDBMS, HPC, hypervisor or JVM,
> >they could just be using hugepages themselves, couldn't they?
> >
> > It seems more interesting as a more general speedup for applications that 
> > can't afford such optimizations? (eg. the common case for most people)
> 
> The problem with hugetlbfs is that you need to commit upfront to using it, 
> and that you need to be the admin.  For virtualization, you want to use 
> hugepages when there is no memory pressure, but you want to use ksm, 
> ballooning, and swapping when there is (and then go back to large pages when 
> pressure is relieved, e.g. by live migration).
> 
> HPC and databases can probably live with hugetlbfs.  JVM is somewhere in the 
> middle, they do allocate memory dynamically.

Even for HPC hugetlbfs is often not good enough: if the data is being 
constantly acquired and put into a file and if it needs to be in persistent 
storage then you dont want to (and cannot) copy it to hugetlbfs (on a poweroff 
you would lose the file).

Furthermore there's also the deployment barrier of marginal improvements: not 
many apps are willing to change for a +0.1% improvement - or even for a +0.9% 
improvement - _especially_ if that improvement also needs admin access and per 
distribution hackery. (each distribution tends to have their own slightly 
different way of handing filesystems and other permission/configuration 
matters)

We've seen that with sendfile() and splice() an it's no different with 
hugetlbs either.

hugetlbfs is basically a non-default poor-man's solution for something that 
the kernel should be providing transparently. It's a bad hack that is good 
enough to prototype that something works, but it has serious deployment, 
configuration and usage limitations. Only a kernel hacker detached from 
everyday application development and packaging constraints can believe that 
it's a high-quality technical solution.

Transparent hugepages eliminates most of the app-visible disadvantages by 
shuffling the problems into the kernel [and no doubt causing follow-on 
headaches there] and by utilizing the 'power of the default' - and thus 
opening up hugetlbs to far more apps. [*]

It's a really simple mechanism.

Thanks,

	Ingo

[*] Note, it would be even better if the kernel provided the C library [a'ka 
    klibc] and if hugetlbs could be utilized via malloc() et al more 
    transparently by us changing the user-space library in the kernel repo and 
    deploying it to apps via a new kernel that provides an updated C library. 
    We dont do that so we are stuck with crappier solutions and slower 
    propagation of changes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>