[PATCH 0/5] prevent OOM triggered by TTM

Hongbo.He@xxxxxxx (He, Roger) · Wed, 7 Feb 2018 08:25:01 +0000

	Why should TTM be different in that aspect? It would be good to know your reasoning WRT this?

Now, in TTM struct ttm_bo_device it already has member no_retry to indicate your option.
If you prefer no OOM triggered by TTM, set it as true. The default is false to keep original behavior. 
AMD prefers return value of no memory rather than OOM for now.

	One thing I looked at at one point was to have TTM do the swapping itself instead of handing it off to the shmem system. That way we could pre-allocate swap entries for all swappable (BO) memory, making sure that we wouldn't run out of swap space when, 

I prefer current mechanism of swap out. At the beginning the swapped pages stay in system memory by shmem until OS move to status with high memory pressure, that has an obvious advantage. For example, if the BO is swapped out into shmem, but not really be flushed into swap disk. When validate it and swap in it at this moment, the overhead is small compared to swap in from disk. In addition, No need swap space reservation for TTM pages when allocation since swap disk is shared not only for TTM exclusive.
So again we provide a flag no_retry in struct ttm_bo_device to let driver set according to its request.

Thanks
Roger(Hongbo.He)

-----Original Message-----
From: Thomas Hellstrom [mailto:thomas@xxxxxxxxxxxx] 
Sent: Wednesday, February 07, 2018 2:43 PM
To: He, Roger <Hongbo.He at amd.com>; amd-gfx at lists.freedesktop.org; dri-devel at lists.freedesktop.org
Cc: Koenig, Christian <Christian.Koenig at amd.com>
Subject: Re: [PATCH 0/5] prevent OOM triggered by TTM

Hi, Roger,

On 02/06/2018 10:04 AM, Roger He wrote:
> currently ttm code has no any allocation limit. So it allows pages 
> allocatation unlimited until OOM. Because if swap space is full of 
> swapped pages and then system memory will be filled up with ttm pages. 
> and then any memory allocation request will trigger OOM.
>

I'm a bit curious, isn't this the way things are supposed to work on a linux system?
If all memory resources are used up, the OOM killer will kill the most memory hungry (perhaps rogue) process rather than processes being nice and try to find out themselves whether allocations will succeed?
Why should TTM be different in that aspect? It would be good to know your reasoning WRT this?

Admittedly, graphics process OOM memory accounting doesn't work very well, due to not all BOs not being CPU mapped, but it looks like there is recent work towards fixing this?

One thing I looked at at one point was to have TTM do the swapping itself instead of handing it off to the shmem system. That way we could pre-allocate swap entries for all swappable (BO) memory, making sure that we wouldn't run out of swap space when, for example, hibernating and that would also limit the pinned non-swappable memory (from TTM driver kernel allocations for example) to half the system memory resources.

Thanks,
Thomas