-----Original Message----- From: Michel Dänzer [mailto:michel@xxxxxxxxxxx] Sent: Saturday, December 09, 2017 1:26 AM To: Koenig, Christian <Christian.Koenig at amd.com>; He, Roger <Hongbo.He at amd.com> Cc: amd-gfx at lists.freedesktop.org Subject: Re: [PATCH] drm/amd/amdgpu: set gtt size according to system memory size only On 2017-12-07 07:07 PM, Christian König wrote: > Am 07.12.2017 um 18:34 schrieb Michel Dänzer: >> On 2017-11-29 10:12 AM, Roger He wrote: >>> Change-Id: Ib634375b90d875fe04a890fc82fb1e3b7112676a >>> Signed-off-by: Roger He <Hongbo.He at amd.com> >>> --- >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 +++----- >>>  1 file changed, 3 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> index 17bf0ce..d0661907 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> @@ -1330,11 +1330,9 @@ int amdgpu_ttm_init(struct amdgpu_device >>> *adev) >>>          struct sysinfo si; >>>           si_meminfo(&si); >>> -       gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20), >>> -                  adev->mc.mc_vram_size), >>> -                  ((uint64_t)si.totalram * si.mem_unit * 3/4)); >>> -   } >>> -   else >>> +       gtt_size = max(AMDGPU_DEFAULT_GTT_SIZE_MB << 20, >>> +           (uint64_t)si.totalram * si.mem_unit * 3/4); >>> +   } else >>>          gtt_size = (uint64_t)amdgpu_gtt_size << 20; >>>      r = ttm_bo_init_mm(&adev->mman.bdev, TTM_PL_TT, gtt_size >> >>> PAGE_SHIFT); >>>      if (r) { >>> >> I'm unable to finish a piglit run (using Mesa on Tonga in a Ryzen 7 >> 1700 system with 16 GB of RAM) with this change. Before, I had >> >>  [drm] amdgpu: 3072M of GTT memory ready. >> >> now it's >> >>  [drm] amdgpu: 10473M of GTT memory ready. >> >> While running piglit, there's lots of >> >>  [TTM] Out of kernel memory >> >> messages, followed by more badness, and eventually the machine >> becomes inaccessible via SSH and has to be hard rebooted. >> >> >> It occurred to me one thing not being taken into account here is that >> system memory is also needed for storing the contents of BOs evicted >> from VRAM. So I tried subtracting the VRAM size, resulting in >> >>  [drm] amdgpu: 8425M of GTT memory ready. >> >> but the problem still happened. So I tried 1/2 instead of 3/4 of RAM, >> resulting in >> >>  [drm] amdgpu: 6982M of GTT memory ready. >> >> and was able to finish a piglit run with that. > > I think I know what is going on here. The max-texture-size keeps > increasing the texture size as long as it doesn't fails to allocate one. > > So the "Out of kernel memory" message is actually the desired effect > (but we should probably remove the message). > > The price question is what happens after that? Those code paths are > probably not very well tested. I'm attaching all the related dmesg output I've captured. Basically, best case is the OOM reaper killing piglit, worst case is no response to SSH => hard reboot. Until this becomes more robust, this change should probably be reverted. I have not got any valuable info from error log. Please revert it now as quick solution, later let me check the reason. Thanks Roger(Hongbo.He) On 2017-12-08 02:52 AM, He, Roger wrote: > [TTM] Out of kernel memory The direct reason is GTT > BO swap out failure which results from more Bo allocation. And along with that need more acc_size. > But why swap out failure I am not sure that is expected here for this case, maybe need to investigate. FWIW, though it's probably not directly related: This happens with the swap partition (8GB) completely unused. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer