Plan: BO move throttling for visible VRAM evictions

michel@xxxxxxxxxxx (Michel Dänzer) · Fri, 19 May 2017 16:45:35 +0900

On 18/05/17 07:22 PM, Marek OlÅ¡Ã¡k wrote:
> On May 18, 2017 10:17 AM, "Michel DÃ¤nzer" <michel at daenzer.net
> <mailto:michel at daenzer.net>> wrote:
> 
>     On 17/05/17 09:35 PM, Marek OlÅ¡Ã¡k wrote:
>     > On May 16, 2017 3:57 AM, "Michel DÃ¤nzer" <michel at daenzer.net
>     <mailto:michel at daenzer.net>
>     > <mailto:michel at daenzer.net <mailto:michel at daenzer.net>>> wrote:
>     >     On 15/05/17 07:11 PM, Marek OlÅ¡Ã¡k wrote:
>     >     > On May 15, 2017 4:29 AM, "Michel DÃ¤nzer" <michel at daenzer.net
>     <mailto:michel at daenzer.net>
>     >     <mailto:michel at daenzer.net <mailto:michel at daenzer.net>>
>     >     > <mailto:michel at daenzer.net <mailto:michel at daenzer.net>
>     <mailto:michel at daenzer.net <mailto:michel at daenzer.net>>>> wrote:
>     >     >
>     >     >     I think the next step should be to make radeonsi keep
>     track of
>     >     how much
>     >     >     VRAM it's trying to use that's expected to be accessed
>     by the
>     >     CPU, and
>     >     >     to use GTT instead when that exceeds a threshold (probably
>     >     derived from
>     >     >     vram_vis_size).
>     >     >
>     >     > That's difficult to estimate. There are apps with 600MB of
>     mapped VRAM
>     >     > and don't experience any performance issues. And some apps with
>     >     300MB of
>     >     > mapped VRAM do. It only depends on the CPU access pattern,
>     not what
>     >     > radeonsi sees.
>     >
>     >     What I mean is keeping track of the total size of resources
>     which have
>     >     RADEON_DOMAIN_VRAM and RADEON_FLAG_CPU_ACCESS set, and if it
>     exceeds a
>     >     threshold, create new ones having those flags in GTT instead. Even
>     >     though this might not be strictly necessary with amdgpu in the
>     long run,
>     >     it probably is for radeon anyway, and in the short term it
>     might help
>     >     even with amdgpu.
>     >
>     >
>     > That might hurt us more than it can help.
> 
>     You may be right, but I think I'll play with that idea a little anyway
>     to see how it goes. :)
> 
>     > All mappable buffers have the CPU access flag set, but many of
>     them are
>     > immutable.
> 
>     You mean they're only written to once by the CPU? We shouldn't set the
>     RADEON_FLAG_CPU_ACCESS flag for BOs where we expect that, because it
>     will currently prevent them from being in the CPU invisible part of
>     VRAM.
> 
> 
> The only thing I can do is set the CPU access flag for persistently
> mapped buffers only.

Something like that might make sense for now.

> We certainly want buffers to go to the invisible part of VRAM if there
> is no CPU access for a certain timeframe. So maybe we shouldn't set the
> flag at all. What do you thing?

https://patchwork.freedesktop.org/patch/156991/ allows
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED BOs to be evicted from CPU visible
to invisible VRAM, but I'm not sure yet that's a good idea.
"CPU_ACCESS_REQUIRED" kind of implies CPU access should always be possible.

>     > The only place where this can be handledâ?? is the kernel.
> 
>     Ideally, the placement of a BO should be determined based on how it's
>     actually being used by the GPU vs CPU. But I'm not sure how to determine
>     that in a useful way.
> 
> CPU page faults are the only way to determine that CPU access is happening.

A page fault only happens the first time (since the BO was last moved)
the CPU tries to access a page. Currently we're not even differentiating
reads vs writes, and we have no idea how much CPU access happens to a
page after it's faulted in.

-- 
Earthling Michel DÃ¤nzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer