Re: [PATCH 00 of 41] Transparent Hugepage Support #17

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/10/2010 10:02 PM, Ingo Molnar wrote:
* Andrea Arcangeli<aarcange@xxxxxxxxxx>  wrote:

[...]

This is already fully usable and works great, and as Avi showed it boosts
even a sort on host by 6%, think about HPC applications, and soon I hope to
boost gcc on host by 6% (and of>15% in guest with NPT/EPT) by extending
vm_end in 2M chunks in glibc, at least for those huge gcc builds taking
200M like translate.o of qemu-kvm... (so I hope soon gcc running on KVM
guest, thanks to EPT/NPT, will run faster than on mainline kernel without
transparent hugepages on bare metal).
I think what would be needed is some non-virtualization speedup example of a
'non-special' workload, running on the native/host kernel. 'sort' is an
interesting usecase - could it be patched to use hugepages if it has to sort
through lots of data?

In fact it works well unpatched, the 6% I measured was with the system sort.

Currently in order to use hugepages (with the 'always' option) the only requirement is that the application uses a few large vmas.

Is it practical to run something like a plain make -jN kernel compile all in
hugepages, and see a small but measurable speedup?

I doubt it - kernel builds run in relatively little memory. The link stage uses a lot of memory but is fairly fast (I guess due to the partial links before). Building a template-heavy C++ application might show some gains.

Although it's not an ideal workload for computational speedups at all because
a lot of the time we spend in a kernel build is really buildup/teardown of
process state/context and similar 'administrative' overhead, while the true
'compilation work' is just a burst of a few dozen milliseconds and then we
tear down all the state again. (It's very inefficient really.)

Something like GIMP calculations would be a lot more representative of the
speedup potential. Is it possible to run the GIMP with transparent hugepages
enabled for it?

I thought of it, but raster work is too regular so speculative execution should hide the tlb fill latency. It's also easy to code in a way which hides cache effects (no idea if it is actually coded that way). Sort showed a speedup since it defeats branch prediction and thus the processor cannot pipeline the loop.

I thought ray tracers with large scenes should show a nice speedup, but setting this up is beyond my capabilities.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]