Re: [PATCH 00 of 41] Transparent Hugepage Support #17

Avi Kivity <avi@xxxxxxxxxx> · Sat, 10 Apr 2010 22:22:28 +0300

On 04/10/2010 10:02 PM, Ingo Molnar wrote:
* Andrea Arcangeli<aarcange@xxxxxxxxxx>  wrote:

[...]

This is already fully usable and works great, and as Avi showed it boosts
even a sort on host by 6%, think about HPC applications, and soon I hope to
boost gcc on host by 6% (and of>15% in guest with NPT/EPT) by extending
vm_end in 2M chunks in glibc, at least for those huge gcc builds taking

200M like translate.o of qemu-kvm... (so I hope soon gcc running on KVM

guest, thanks to EPT/NPT, will run faster than on mainline kernel without
transparent hugepages on bare metal).

I think what would be needed is some non-virtualization speedup example of a
'non-special' workload, running on the native/host kernel. 'sort' is an
interesting usecase - could it be patched to use hugepages if it has to sort
through lots of data?

In fact it works well unpatched, the 6% I measured was with the system sort.

Currently in order to use hugepages (with the 'always' option) the only 
requirement is that the application uses a few large vmas.

Is it practical to run something like a plain make -jN kernel compile all in
hugepages, and see a small but measurable speedup?

I doubt it - kernel builds run in relatively little memory.  The link 
stage uses a lot of memory but is fairly fast (I guess due to the 
partial links before).  Building a template-heavy C++ application might 
show some gains.

Although it's not an ideal workload for computational speedups at all because
a lot of the time we spend in a kernel build is really buildup/teardown of
process state/context and similar 'administrative' overhead, while the true
'compilation work' is just a burst of a few dozen milliseconds and then we
tear down all the state again. (It's very inefficient really.)

Something like GIMP calculations would be a lot more representative of the
speedup potential. Is it possible to run the GIMP with transparent hugepages
enabled for it?

I thought of it, but raster work is too regular so speculative execution 
should hide the tlb fill latency.  It's also easy to code in a way which 
hides cache effects (no idea if it is actually coded that way).  Sort 
showed a speedup since it defeats branch prediction and thus the 
processor cannot pipeline the loop.

I thought ray tracers with large scenes should show a nice speedup, but 
setting this up is beyond my capabilities.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>