Re: [PATCH 00 of 41] Transparent Hugepage Support #17

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Avi Kivity <avi@xxxxxxxxxx> wrote:

> On 04/11/2010 12:37 PM, Jason Garrett-Glaser wrote:
> >
> >># time x264 --crf 20 --quiet crowd_run_2160p.y4m -o /dev/null --threads 2
> >>yuv4mpeg: 3840x2160@50/1fps, 1:1
> >>
> >>encoded 500 frames, 0.68 fps, 251812.80 kb/s
> >>
> >>real    12m17.154s
> >>user    20m39.151s
> >>sys    0m11.727s
> >>
> >># echo never>  /sys/kernel/mm/transparent_hugepage/enabled
> >># echo never>  /sys/kernel/mm/transparent_hugepage/khugepaged/enabled
> >># time x264 --crf 20 --quiet crowd_run_2160p.y4m -o /dev/null --threads 2
> >>yuv4mpeg: 3840x2160@50/1fps, 1:1
> >>
> >>encoded 500 frames, 0.66 fps, 251812.80 kb/s
> >>
> >>real    12m37.962s
> >>user    21m13.506s
> >>sys    0m11.696s
> >>
> >>Just 2.7%, even though the working set was much larger.
> >Did you make sure to check your stddev on those?
> 
> I'm doing another run to look at variability.

Sigh. Could you please stop using stone-age tools like /usr/bin/time and 
instead use:

 perf stat --repeat 3 x264 ...

you can install it via:

 cd linux
 cd tools/perf/
 make -j install

That way you will see 'variability' (sttdev/error bars/fuzz), and a whole lot 
of other CPU details beyond much more precise measurements:

 $ perf stat --repeat 3 x264 --crf 20 --quiet soccer_4cif.y4m -o /dev/null --threads 2
 yuv4mpeg: 704x576@60/1fps, 128:117

 encoded 2 frames, 23.47 fps, 39824.64 kb/s
 yuv4mpeg: 704x576@60/1fps, 128:117

 encoded 2 frames, 23.52 fps, 39824.64 kb/s
 yuv4mpeg: 704x576@60/1fps, 128:117

 encoded 2 frames, 23.45 fps, 39824.64 kb/s

 Performance counter stats for 'x264 --crf 20 --quiet soccer_4cif.y4m -o /dev/null --threads 2' (3 runs):

     130.624286  task-clock-msecs         #      1.496 CPUs    ( +-   0.081% )
             74  context-switches         #      0.001 M/sec   ( +-   7.151% )
              3  CPU-migrations           #      0.000 M/sec   ( +-  25.000% )
           2987  page-faults              #      0.023 M/sec   ( +-   0.162% )
      389234822  cycles                   #   2979.804 M/sec   ( +-   0.081% )
      481360693  instructions             #      1.237 IPC     ( +-   0.036% )
        4206296  cache-references         #     32.201 M/sec   ( +-   0.387% )
          55732  cache-misses             #      0.427 M/sec   ( +-   0.529% )

    0.087336553  seconds time elapsed   ( +-   0.100% )

Note that perf stat will run fine on older [pre-2.6.31] kernels too (it will 
measure elapsed time) and even there it will be much more precise than 
/usr/bin/time.

For more dTLB details, use something like:

 perf stat -e cycles -e instructions -e dtlb-loads -e dtlb-load-misses --repeat 3 x264 ...

Yes, i know we had a big flamewar about perf kvm, but IMHO that is no reason 
for you to pretend that this tool doesnt exist ;-)

> > I'm also curious how it compares for --preset ultrafast and so forth.
> 
> Is this something realistic or just a benchmark thing?

I'd suggest for you to use the default settings, to make it realistic. (Maybe 
also 'advanced/high-quality' settings that an advanced user would utilize.)

It is no doubt that benchmark advantages can be shown - the point of this 
exercise is to show that there are real-life speedups to various categories of 
non-server apps that hugetlb gives us.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]