Am 30.04.19 um 01:16 schrieb Kuehling, Felix:
On 2019-04-29 8:34 a.m., Christian König wrote:
Am 28.04.19 um 09:44 schrieb Kuehling, Felix:
From: Kent Russell <kent.russell@xxxxxxx>
GTT size is currently limited to the minimum of VRAM size or 3/4 of
system memory. This severely limits the quanitity of system memory
that can be used by ROCm application.
Increase GTT size to the maximum of VRAM size or system memory size.
Well, NAK.
This limit was done on purpose because we otherwise the
max-texture-size would be crashing the system because the OOM killer
would be causing a system panic.
Using more than 75% of system memory by the GPU at the same time makes
the system unstable and so we can't allow that by default.
Like we discussed, the current implementation is too limiting. On a Fiji
system with 4GB VRAM and 32GB system memory, it limits system memory
allocations to 4GB. I think this workaround was fixed once before and
reverted because it broke a CZ system with 1GB system memory. So I
suspect that this is an issue affecting small memory systems where maybe
the 1/2 system memory limit in TTM isn't sufficient to protect from OOM
panics.
Well it not only broke on a 1GB CZ system, this was just where Andrey
reproduced it. We got reports from all kind of systems.
The OOM killer problem is a more general problem that potentially
affects other drivers too. Keeping this GTT limit broken in AMDGPU is an
inadequate workaround at best. I'd like to look for a better solution,
probably some adjustment of the TTM system memory limits on systems with
small memory, to avoid OOM panics on such systems.
The core problem here is that the OOM killer explicitly doesn't want to
block for shaders to finish whatever it is doing.
So currently as soon as the hardware is using some memory it can't be
reclaimed immediately.
The original limit in TTM was 2/3 of system memory and that worked
really reliable and we ran into problems only after raising it to 3/4.
To sum it up the requirement of using almost all system memory by a GPU
is simply not possible upstream and even in any production system rather
questionable.
The only real solution I can see is to be able to reliable kill shaders
in an OOM situation.
Regards,
Christian.
Regards,
Felix
What could maybe work is to reduce amount of system memory by a fixed
factor, but I of hand don't see a way of fixing this in general.
Regards,
Christian.
Signed-off-by: Kent Russell <kent.russell@xxxxxxx>
Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>
Signed-off-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c14198737dcd..e9ecc3953673 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1740,11 +1740,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
struct sysinfo si;
si_meminfo(&si);
- gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
- adev->gmc.mc_vram_size),
- ((uint64_t)si.totalram * si.mem_unit * 3/4));
- }
- else
+ gtt_size = max3((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
+ adev->gmc.mc_vram_size,
+ ((uint64_t)si.totalram * si.mem_unit));
+ } else
gtt_size = (uint64_t)amdgpu_gtt_size << 20;
/* Initialize GTT memory pool */
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx