On 2017å¹´03æ??27æ?¥ 17:29, Christian König wrote: > On APUs I've already enabled using direct access to the stolen parts > of system memory. Thanks, could you point me out where is doing this? Regards, David Zhou > > So there won't be any eviction any more because of page faults on APUs. > > Regards, > Christian. > > Am 27.03.2017 um 09:53 schrieb Zhou, David(ChunMing): >> For APU special case, can we prevent eviction happening between VRAM >> <----> GTT? >> >> Regards, >> David Zhou >> >> -----Original Message----- >> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On >> Behalf Of Michel D?nzer >> Sent: Monday, March 27, 2017 3:36 PM >> To: Marek Olšák <maraeo at gmail.com> >> Cc: amd-gfx mailing list <amd-gfx at lists.freedesktop.org> >> Subject: Re: Plan: BO move throttling for visible VRAM evictions >> >> On 25/03/17 01:33 AM, Marek Olšák wrote: >>> Hi, >>> >>> I'm sharing this idea here, because it's something that has been >>> decreasing our performance a lot recently, for example: >>> http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/7b7668cfc1 >>> 09d1c3dc27e871c8aea71ca13f23fa >>> >>> I think the problem there is that Mesa git started uploading >>> descriptors and uniforms to VRAM, which helps when TC L2 has a low >>> hit/miss ratio, but the performance can randomly drop by an order of >>> magnitude. I've heard rumours that kernel 4.11 has an improved >>> allocator that should perform better, but the situation is still far >>> from ideal. >>> >>> AMD CPUs and APUs will hopefully suffer less, because we can resize >>> the visible VRAM with the help of our CPU hw specs, but Intel CPUs >>> will remain limited to 256 MB. The following plan describes how to do >>> throttling for visible VRAM evictions. >>> >>> >>> 1) Theory >>> >>> Initially, the driver doesn't care about where buffers are in VRAM, >>> because VRAM buffers are only moved to visible VRAM on CPU page faults >>> (when the CPU touches the buffer memory but the memory is in the >>> invisible part of VRAM). When it happens, >>> amdgpu_bo_fault_reserve_notify is called, which moves the buffer to >>> visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify >>> also marks the buffer as contiguous, which makes memory fragmentation >>> worse. >>> >>> I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify >>> was much higher in a CPU profiler than anything else in the kernel. >>> >>> >>> 2) Monitoring via Gallium HUD >>> >>> We need to expose 2 kernel counters via the INFO ioctl and display >>> those via Gallium HUD: >>> - The number of VRAM CPU page faults. (the number of calls to >>> amdgpu_bo_fault_reserve_notify). >>> - The number of bytes moved by ttm_bo_validate inside >>> amdgpu_bo_fault_reserve_notify. >>> >>> This will help us observe what exactly is happening and fine-tune the >>> throttling when it's done. >>> >>> >>> 3) Solution >>> >>> a) When amdgpu_bo_fault_reserve_notify is called, record the fact. >>> (amdgpu_bo::had_cpu_page_fault = true) >>> >>> b) Monitor the MB/s rate at which buffers are moved by >>> amdgpu_bo_fault_reserve_notify. If we get above a specific threshold, >>> don't move the buffer to visible VRAM. Move it to GTT instead. Note >>> that moving to GTT can be cheaper, because moving to visible VRAM is >>> likely to evict a lot of buffers there and unmap them from the CPU, >> FWIW, this can be avoided by only setting GTT in busy_placement. Then >> TTM will only move the BO to visible VRAM if that can be done without >> evicting anything from there. >> >> >>> but moving to GTT shouldn't evict or unmap anything. >>> >>> c) When we get into the CS ioctl and a buffer has had_cpu_page_fault, >>> it can be moved to VRAM if: >>> - the GTT->VRAM move rate is low enough to allow it (this is the >>> existing throttling mechanism) >>> - the visible VRAM move rate is low enough that we will be OK with >>> another CPU page fault if it happens. >> Some other ideas that might be worth trying: >> >> Evicting BOs to GTT instead of moving them to CPU accessible VRAM in >> principle in some cases (e.g. for all BOs except those with >> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) or even always. >> >> Implementing eviction from CPU visible to CPU invisible VRAM, similar >> to how it's done in radeon. Note that there's potential for userspace >> triggering an infinite loop in the kernel in cases where BOs are >> moved back from invisible to visible VRAM on page faults. >> >> >