Re: [PATCH 00 of 41] Transparent Hugepage Support #17

Avi Kivity <avi@xxxxxxxxxx> · Sat, 10 Apr 2010 23:24:36 +0300

On 04/10/2010 10:47 PM, Ingo Molnar wrote:
* Avi Kivity<avi@xxxxxxxxxx>  wrote:

I think what would be needed is some non-virtualization speedup example of
a 'non-special' workload, running on the native/host kernel. 'sort' is an
interesting usecase - could it be patched to use hugepages if it has to
sort through lots of data?

In fact it works well unpatched, the 6% I measured was with the system sort.

Yes - but you intentionally sorted something large - the question is, how big
is the slowdown with small sizes (if there's a slowdown), where is the
break-even point (if any)?

There shouldn't be a slowdown as far as I can tell.  The danger IMO is 
to pin down unused pages in a huge page and so increase memory pressure 
artificially.

The point where this starts to win would be more or less when the page 
tables mapping the working set hit the size of the last-level cache, 
multiplied by some loading factor (guess: 0.5).  So if you have  a 4MB 
cache, the win should start at around 1GB working set.

Something like GIMP calculations would be a lot more representative of the
speedup potential. Is it possible to run the GIMP with transparent
hugepages enabled for it?

I thought of it, but raster work is too regular so speculative execution
should hide the tlb fill latency.  It's also easy to code in a way which
hides cache effects (no idea if it is actually coded that way).  Sort showed
a speedup since it defeats branch prediction and thus the processor cannot
pipeline the loop.

Would be nice to try because there's a lot of transformations within Gimp -
and Gimp can be scripted. It's also a test for negatives: if there is an
across-the-board _lack_ of speedups, it shows that it's not really general
purpose but more specialistic.

Right, but I don't think I can tell which transforms are likely to be 
sped up.  Also, do people manipulate 500MB images regularly?

A 20MB image won't see a significant improvement (40KB page tables, 
that's chickenfeed).

If the optimization is specialistic, then that's somewhat of an argument
against automatic/transparent handling. (even though even if the beneficiaries
turn out to be only special workloads then transparency still has advantages.)

Well, we know that databases, virtualization, and server-side java win 
from this.  (Oracle won't benefit from this implementation since it 
wants shared, not anonymous, memory, but other databases may).  I'm 
guessing large C++ compiles, and perhaps the new link-time optimization 
feature, will also see a nice speedup.

Desktops will only benefit when they bloat to ~8GB RAM and 1-2GB firefox 
RSS, probably not so far in the future.

I thought ray tracers with large scenes should show a nice speedup, but
setting this up is beyond my capabilities.

Oh, this tickled some memories: x264 compressed encoding can be very cache and
TLB intense. Something like the encoding of a 350 MB video file:

   wget http://media.xiph.org/video/derf/y4m/soccer_4cif.y4m       # NOTE: 350 MB!
   x264 --crf 20 --quiet soccer_4cif.y4m -o /dev/null --threads 4

would be another thing worth trying with transparent-hugetlb enabled.

I'll try it out.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>