On Fri, Jun 23, 2017 at 05:02:58PM -0400, Felix Kuehling wrote: > Hi John, > > I haven't read your patches. Just a question based on the cover letter. > > I understand that visible VRAM is the biggest pain point. But could the > same reasoning make sense for invisible VRAM? That is, doing all the > migrations to VRAM in a workqueue? > > Regards, > Felix > I don't see why not. In theory, all non-essential buffer moves could be done this way, and it would be relatively trivial to extend it to that. But I wanted to limit the scope of my changes, at least for this series. Testing takes a long time and I wanted to focus those testing efforts as much as possible, produce something well-tested (I hope), and get feedback on this limited application of the concept before expanding its reach. John > > On 17-06-23 01:39 PM, John Brooks wrote: > > This patch series is intended to improve performance when limited CPU-visible > > VRAM is under pressure. > > > > Moving BOs into visible VRAM is essentially a housekeeping task. It's faster to > > access them in VRAM than GTT, but it isn't a hard requirement for them to be in > > VRAM. As such, it is unnecessary to spend valuable time blocking on this in the > > page fault handler or during command submission. Doing so translates directly > > into a longer frame time (ergo stalls and stuttering). > > > > The problem worsens when attempting to move BOs into visible VRAM when it is > > full. This takes much longer than a simple move because other BOs have to be > > evicted, which involves finding and then moving potentially hundreds of other > > BOs, which is very time consuming. In the case of limited visible VRAM, it's > > important to do this sometime to keep the contents of visible VRAM fresh, but > > it does not need to be a blocking operation. If visible VRAM is full, the BO > > can be read from GTT in the meantime and the BO can be moved to VRAM later. > > > > Thus, I have made it so that neither the command submission code nor page fault > > handler spends time evicting BOs from visible VRAM, and instead this is > > deferred to a workqueue function that's queued when CS requests BOs flagged > > AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED. > > > > Speaking of CPU_ACCESS_REQUIRED, I've changed the handling of that flag so that > > the kernel driver can clear it later even if it was set by userspace. This is > > because the userspace graphics library can't know whether the application > > really needs it to be CPU_ACCESS_REQUIRED forever. The kernel driver can't know > > either, but it does know when page faults occur, and if a BO doesn't appear to > > have any page faults when it's moved somewhere inaccessible, the flag can be > > removed and it doesn't have to take up space in CPU-visible memory anymore. > > This change was based on IRC discussions with Michel. > > > > Patch 7 fixes a problem with BO moverate throttling that causes visible VRAM > > moves to not be throttled if total VRAM isn't full enough. > > > > I've also added a vis_vramlimit module parameter for debugging purposes. It's > > similar to the vramlimit parameter except it limits only visible VRAM. > > > > I have tested this patch set with the two games I know to be affected by > > visible VRAM pressure: DiRT Rally and Dying Light. It practically eliminates > > eviction-related stuttering in DiRT Rally as well as very low performance if > > visible VRAM is limited to 64MB. It also fixes severely low framerates that > > occurred in some areas of Dying Light. All my testing was done with an R9 290 > > with 4GB of visible VRAM with an Intel i7 4790. > > > > -- > > John Brooks (Frogging101) > > > > _______________________________________________ > > amd-gfx mailing list > > amd-gfx at lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx >