On 04/10/2010 10:02 PM, Ingo Molnar wrote:
* Andrea Arcangeli<aarcange@xxxxxxxxxx> wrote:
[...]
This is already fully usable and works great, and as Avi showed it boosts
even a sort on host by 6%, think about HPC applications, and soon I hope to
boost gcc on host by 6% (and of>15% in guest with NPT/EPT) by extending
vm_end in 2M chunks in glibc, at least for those huge gcc builds taking
200M like translate.o of qemu-kvm... (so I hope soon gcc running on KVM
guest, thanks to EPT/NPT, will run faster than on mainline kernel without
transparent hugepages on bare metal).
I think what would be needed is some non-virtualization speedup example of a
'non-special' workload, running on the native/host kernel. 'sort' is an
interesting usecase - could it be patched to use hugepages if it has to sort
through lots of data?
In fact it works well unpatched, the 6% I measured was with the system sort.
Currently in order to use hugepages (with the 'always' option) the only
requirement is that the application uses a few large vmas.
Is it practical to run something like a plain make -jN kernel compile all in
hugepages, and see a small but measurable speedup?
I doubt it - kernel builds run in relatively little memory. The link
stage uses a lot of memory but is fairly fast (I guess due to the
partial links before). Building a template-heavy C++ application might
show some gains.
Although it's not an ideal workload for computational speedups at all because
a lot of the time we spend in a kernel build is really buildup/teardown of
process state/context and similar 'administrative' overhead, while the true
'compilation work' is just a burst of a few dozen milliseconds and then we
tear down all the state again. (It's very inefficient really.)
Something like GIMP calculations would be a lot more representative of the
speedup potential. Is it possible to run the GIMP with transparent hugepages
enabled for it?
I thought of it, but raster work is too regular so speculative execution
should hide the tlb fill latency. It's also easy to code in a way which
hides cache effects (no idea if it is actually coded that way). Sort
showed a speedup since it defeats branch prediction and thus the
processor cannot pipeline the loop.
I thought ray tracers with large scenes should show a nice speedup, but
setting this up is beyond my capabilities.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>