Am 27.03.2017 um 11:36 schrieb zhoucm1: > > > On 2017å¹´03æ??27æ?¥ 17:29, Christian König wrote: >> On APUs I've already enabled using direct access to the stolen parts >> of system memory. > Thanks, could you point me out where is doing this? See here gmc_v7_0_mc_init(): > /* Could aper size report 0 ? */ > adev->mc.aper_base = pci_resource_start(adev->pdev, 0); > adev->mc.aper_size = pci_resource_len(adev->pdev, 0); > /* size in MB on si */ > adev->mc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * > 1024ULL; > adev->mc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * > 1024ULL; > > #ifdef CONFIG_X86_64 > if (adev->flags & AMD_IS_APU) { > adev->mc.aper_base = ((u64)RREG32(mmMC_VM_FB_OFFSET)) > << 22; > adev->mc.aper_size = adev->mc.real_vram_size; > } > #endif > We use the real physical address and size as aperture on APUs. Similar code is in gmc_v8_0_mc_init(). Regards, Christian. > > Regards, > David Zhou >> >> So there won't be any eviction any more because of page faults on APUs. >> >> Regards, >> Christian. >> >> Am 27.03.2017 um 09:53 schrieb Zhou, David(ChunMing): >>> For APU special case, can we prevent eviction happening between VRAM >>> <----> GTT? >>> >>> Regards, >>> David Zhou >>> >>> -----Original Message----- >>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On >>> Behalf Of Michel D?nzer >>> Sent: Monday, March 27, 2017 3:36 PM >>> To: Marek Olšák <maraeo at gmail.com> >>> Cc: amd-gfx mailing list <amd-gfx at lists.freedesktop.org> >>> Subject: Re: Plan: BO move throttling for visible VRAM evictions >>> >>> On 25/03/17 01:33 AM, Marek Olšák wrote: >>>> Hi, >>>> >>>> I'm sharing this idea here, because it's something that has been >>>> decreasing our performance a lot recently, for example: >>>> http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/7b7668cfc1 >>>> 09d1c3dc27e871c8aea71ca13f23fa >>>> >>>> I think the problem there is that Mesa git started uploading >>>> descriptors and uniforms to VRAM, which helps when TC L2 has a low >>>> hit/miss ratio, but the performance can randomly drop by an order of >>>> magnitude. I've heard rumours that kernel 4.11 has an improved >>>> allocator that should perform better, but the situation is still far >>>> from ideal. >>>> >>>> AMD CPUs and APUs will hopefully suffer less, because we can resize >>>> the visible VRAM with the help of our CPU hw specs, but Intel CPUs >>>> will remain limited to 256 MB. The following plan describes how to do >>>> throttling for visible VRAM evictions. >>>> >>>> >>>> 1) Theory >>>> >>>> Initially, the driver doesn't care about where buffers are in VRAM, >>>> because VRAM buffers are only moved to visible VRAM on CPU page faults >>>> (when the CPU touches the buffer memory but the memory is in the >>>> invisible part of VRAM). When it happens, >>>> amdgpu_bo_fault_reserve_notify is called, which moves the buffer to >>>> visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify >>>> also marks the buffer as contiguous, which makes memory fragmentation >>>> worse. >>>> >>>> I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify >>>> was much higher in a CPU profiler than anything else in the kernel. >>>> >>>> >>>> 2) Monitoring via Gallium HUD >>>> >>>> We need to expose 2 kernel counters via the INFO ioctl and display >>>> those via Gallium HUD: >>>> - The number of VRAM CPU page faults. (the number of calls to >>>> amdgpu_bo_fault_reserve_notify). >>>> - The number of bytes moved by ttm_bo_validate inside >>>> amdgpu_bo_fault_reserve_notify. >>>> >>>> This will help us observe what exactly is happening and fine-tune the >>>> throttling when it's done. >>>> >>>> >>>> 3) Solution >>>> >>>> a) When amdgpu_bo_fault_reserve_notify is called, record the fact. >>>> (amdgpu_bo::had_cpu_page_fault = true) >>>> >>>> b) Monitor the MB/s rate at which buffers are moved by >>>> amdgpu_bo_fault_reserve_notify. If we get above a specific threshold, >>>> don't move the buffer to visible VRAM. Move it to GTT instead. Note >>>> that moving to GTT can be cheaper, because moving to visible VRAM is >>>> likely to evict a lot of buffers there and unmap them from the CPU, >>> FWIW, this can be avoided by only setting GTT in busy_placement. >>> Then TTM will only move the BO to visible VRAM if that can be done >>> without evicting anything from there. >>> >>> >>>> but moving to GTT shouldn't evict or unmap anything. >>>> >>>> c) When we get into the CS ioctl and a buffer has had_cpu_page_fault, >>>> it can be moved to VRAM if: >>>> - the GTT->VRAM move rate is low enough to allow it (this is the >>>> existing throttling mechanism) >>>> - the visible VRAM move rate is low enough that we will be OK with >>>> another CPU page fault if it happens. >>> Some other ideas that might be worth trying: >>> >>> Evicting BOs to GTT instead of moving them to CPU accessible VRAM in >>> principle in some cases (e.g. for all BOs except those with >>> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) or even always. >>> >>> Implementing eviction from CPU visible to CPU invisible VRAM, >>> similar to how it's done in radeon. Note that there's potential for >>> userspace triggering an infinite loop in the kernel in cases where >>> BOs are moved back from invisible to visible VRAM on page faults. >>> >>> >> > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx