On 17/05/17 09:35 PM, Marek Olšák wrote: > On May 16, 2017 3:57 AM, "Michel Dänzer" <michel at daenzer.net > <mailto:michel at daenzer.net>> wrote: > On 15/05/17 07:11 PM, Marek Olšák wrote: > > On May 15, 2017 4:29 AM, "Michel Dänzer" <michel at daenzer.net > <mailto:michel at daenzer.net> > > <mailto:michel at daenzer.net <mailto:michel at daenzer.net>>> wrote: > > > > I think the next step should be to make radeonsi keep track of > how much > > VRAM it's trying to use that's expected to be accessed by the > CPU, and > > to use GTT instead when that exceeds a threshold (probably > derived from > > vram_vis_size). > > > > That's difficult to estimate. There are apps with 600MB of mapped VRAM > > and don't experience any performance issues. And some apps with > 300MB of > > mapped VRAM do. It only depends on the CPU access pattern, not what > > radeonsi sees. > > What I mean is keeping track of the total size of resources which have > RADEON_DOMAIN_VRAM and RADEON_FLAG_CPU_ACCESS set, and if it exceeds a > threshold, create new ones having those flags in GTT instead. Even > though this might not be strictly necessary with amdgpu in the long run, > it probably is for radeon anyway, and in the short term it might help > even with amdgpu. > > > That might hurt us more than it can help. You may be right, but I think I'll play with that idea a little anyway to see how it goes. :) > All mappable buffers have the CPU access flag set, but many of them are > immutable. You mean they're only written to once by the CPU? We shouldn't set the RADEON_FLAG_CPU_ACCESS flag for BOs where we expect that, because it will currently prevent them from being in the CPU invisible part of VRAM. > The only place where this can be handledâ?? is the kernel. Ideally, the placement of a BO should be determined based on how it's actually being used by the GPU vs CPU. But I'm not sure how to determine that in a useful way. > Even if it's as simple as: if (bo->numcpufaults > 10) domain = GTT_WC; I'm skeptical about the number of CPU page faults per se being a useful metric. It doesn't tell us much about how the BO is used even by the CPU, let alone the GPU. But let's see where this leads you. One thing that might help would be if we could swap individual memory nodes between visible and invisible VRAM for CPU page faults, instead of moving/evicting whole BOs. Christian, do you think something like that would be possible? Another idea (to avoid issues such as the recent one with Rocket League) was to make VRAM CPU mappings write-only, and move the BO to GTT if there's a read fault. But not sure if this is possible at all, or how much effort it would be. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer