On Sat, Apr 10, 2010 at 1:53 PM, Avi Kivity <avi@xxxxxxxxxx> wrote: > On 04/10/2010 11:49 PM, Jason Garrett-Glaser wrote: >> >>> 3-5% improvement. I had to tune khugepaged to scan more aggressively >>> since >>> the run is so short. The working set is only ~100MB here though. >>> >> >> I'd try some longer runs with larger datasets to do more testing. >> >> Some things to try: >> >> 1) Pick a 1080p or even 2160p sequence from >> http://media.xiph.org/video/derf/ >> >> > > Ok, I'm downloading crown_run 2160p, but it will take a while. You can always cheat by synthesizing a fake sample like this: ffmpeg -i input.y4m -s 3840x2160 output.y4m Or something similar. Do be careful though; extremely fast presets combined with large input samples will be disk-bottlenecked, so make sure to keep it small enough to fit in disk cache and "prime" the cache before testing. >> 2) Use --preset ultrafast or similar to do a ridiculously >> memory-bandwidth-limited runthrough. >> >> > > Large pages improve random-access memory bandwidth but don't change > sequential access. Which of these does --preset ultrafast change? Hmm, I'm not quite sure. The process is strictly sequential, but there is clearly enough random access mixed in to cause some sort of change given your previous test. The main thing faster presets do is decrease the amount of "work" done at each step, resulting in roughly the same amount of memory bandwidth being required for each step--but in a much shorter period of time. Most "work" done at each step stays well within the L2 cache. Jason -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href