Optimize VM handling a bit more

felix.kuehling@xxxxxxx (Felix Kuehling) · Mon, 10 Sep 2018 22:17:15 -0400

Patches 2, 3, 5, 6, 8, 9, 11 are Reviewed-by: Felix Kuehling
<Felix.Kuehling at amd.com>

I replied with comments to 1, 4, 7, 10.

On another thread, some of the machine learning guys found that the main
overhead of our memory allocator is clearing of BOs. I'm thinking about
a way to avoid that, but your patch 1 interferes with that.

My idea is to cache vram_page_split-sized drm_mm_nodes in
amdgpu_vram_mgr by process, instead of just freeing them. When the same
process allocates memory next, first try to use an existing node that
was already used by the same process. This would work for the common
case that there are no special alignment and placement restrictions.
Having most nodes of the same size (typically 2MB) helps and makes the
lookup of existing nodes very fast. Having to deal with different node
sizes would make it more difficult. Also the cache would likely
interfere with attempts to get large nodes in the first place.

I started some code, but I'm not sure I'll be able to send out something
working for review before my vacation at the end of this week, and then XDC.

Regards,
Â  Felix

On 2018-09-09 02:03 PM, Christian KÃ¶nig wrote:
> Hi everyone,
>
> Especially on Vega and Raven VM handling is rather inefficient while creating PTEs because we originally only supported 2 level page tables and implemented 4 level page tables on top of that.
>
> This patch set reworks quite a bit of that handling and adds proper iterator and tree walking functions which are then used to update PTEs more efficiently.
>
> A totally constructed test case which tried to map 2GB of VRAM on an unaligned address is reduced from 45ms down to ~20ms on my test system.
>
> As a very positive side effect this also adds support for 1GB giant VRAM pages additional to the existing 2MB huge pages on Vega/Raven and also enables all additional power of two values (2MB-2GB) for the L1.
>
> This could be beneficial for applications which allocate very huge amounts of memory because it reduces the overhead of page table walks by 50% (huge pages where 25%).
>
> Please comment and/or review,
> Christian.
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx