Hello, Applying this locally, the issue we were seeing with very high
submit times in high-end workloads seems largely gone. My
methodology is to measure the total time spent in
DRM_IOCTL_AMDGPU_CS with `strace -T` for the whole first scene of
the Shadow of the Tomb Raider benchmark, and divide by the frame
count in that scene to get an idea of how much CPU time is spent
in submissions per frame. More details below. On a Vega20 system with a 3900X, at High settings (~6 gigs of VRAM usage according to UMR, no contention): - 5.2.14: 1.1ms per frame in CS - 5.2.14 + LRU bulk moves: 0.6ms per frame in CS On a Polaris10 system with a i7-7820X, at Very High Settings (7.7G/8G VRAM used, no contention): - 5.2.15: 12.03ms per frame in CS (!) - 5.2.15 + LRU bulk moves: 1.35ms per frame in CS The issue is largely addressed. 1.35ms is still higher than I'd expect, but it's still pretty reasonable. Note that on many of our usecases, submission happens in a separate thread and doesn't typically impact overall frame time/latency if you have extra CPU cores to work with. However it very negatively affects performance as soon as the CPU gets saturated, and burns a ton of power. Thanks! - Pierre-Loup Methodology details: # Mesa patched to kill() itself with SIGCONT in vkQueuePresent to
act as a frame marker in-band with the strace data. # strace collection: strace -f -p 13113 -e ioctl,kill -o sottr_first_scene_vanilla -T # frame count: cat sottr_first_scene_vanilla | grep kill\( | wc -l # total time spent in _CS: cat sottr_first_scene_vanilla | grep AMDGPU_CS | grep -v
unfinished | tr -s ' ' | cut -d ' ' -f7 | tr -d \< | tr -d
\> | xargs | tr ' ' '+' | bc # seconds to milliseconds, then divide by frame count (gdb) p 7.41 * 1000.0 / 616.0 On 9/12/19 8:18 AM, Zhou,
David(ChunMing) wrote:
|
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx