* Avi Kivity <avi@xxxxxxxxxx> wrote: > On 04/10/2010 10:47 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@xxxxxxxxxx> wrote: > > > >>>I think what would be needed is some non-virtualization speedup example of > >>>a 'non-special' workload, running on the native/host kernel. 'sort' is an > >>>interesting usecase - could it be patched to use hugepages if it has to > >>>sort through lots of data? > >>In fact it works well unpatched, the 6% I measured was with the system sort. > >Yes - but you intentionally sorted something large - the question is, how big > >is the slowdown with small sizes (if there's a slowdown), where is the > >break-even point (if any)? > > There shouldn't be a slowdown as far as I can tell. [...] It does not hurt to double check the before/after micro-cost precisely - it would be nice to see a result of: perf stat -e instructions --repeat 100 sort /etc/passwd > /dev/null with and without hugetlb. Linus is right in that the patches are intrusive, and the answer to that isnt to insist that it isnt so (it evidently is so), the correct reply is to broaden the utility of the patches and to demonstrate that the feature is useful on a much wider spectrum of workloads. > > Would be nice to try because there's a lot of transformations within Gimp > > - and Gimp can be scripted. It's also a test for negatives: if there is an > > across-the-board _lack_ of speedups, it shows that it's not really general > > purpose but more specialistic. > > Right, but I don't think I can tell which transforms are likely to be sped > up. Also, do people manipulate 500MB images regularly? > > A 20MB image won't see a significant improvement (40KB page tables, that's > chickenfeed). > > If the optimization is specialistic, then that's somewhat of an argument > > against automatic/transparent handling. (even though even if the > > beneficiaries turn out to be only special workloads then transparency > > still has advantages.) > > Well, we know that databases, virtualization, and server-side java win from > this. (Oracle won't benefit from this implementation since it wants shared, > not anonymous, memory, but other databases may). I'm guessing large C++ > compiles, and perhaps the new link-time optimization feature, will also see > a nice speedup. > > Desktops will only benefit when they bloat to ~8GB RAM and 1-2GB firefox > RSS, probably not so far in the future. 1-2GB firefox RSS is reality for me. Btw., there's another workload that could be cache sensitive, 'git grep': aldebaran:~/linux> perf stat -e cycles -e instructions -e dtlb-loads -e dtlb-load-misses --repeat 5 git grep arca >/dev/null Performance counter stats for 'git grep arca' (5 runs): 1882712774 cycles ( +- 0.074% ) 1153649442 instructions # 0.613 IPC ( +- 0.005% ) 518815167 dTLB-loads ( +- 0.035% ) 3028951 dTLB-load-misses ( +- 1.223% ) 0.597161428 seconds time elapsed ( +- 0.065% ) At first sight, with 7 cycles per cold TLB there's about 1.12% of a speedup potential in that workload. With just 1 cycle it's 0.16%. The real speedup ought to be somewhere inbetween. Btw., instead of throwing random numbers like '3-4%' into this thread it would be nice if you could send 'perf stat --repeat' numbers like i did above - they have an error bar, they show the TLB details, they show the cycles and instructions proportion and they are also far more precise than 'time' based results. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>