Hi, On 02/07/2018 02:22 PM, Christian König wrote: >> Understood, but why is that? > Well because customers requested it :) > > What we try to do here is having a parameter which says when less than > x megabytes of memory are left then fail the allocation. > > This is basically to prevent buggy applications which try to allocate > as much memory as possible until they receive an -ENOMEM from running > into the OOM killer. OK. Understood. > >> That's true, but with VRAM, TTM overcommits swap space which may lead >> to ugly memory allocation failures at hibernate time. > Yeah, that is exactly the reason why I said that Roger should disable > the limit during suspend swap out :) Well that was really in the context of the swapping implementation rather in the context of this change so it was a little off-topic. Even if disabling this limit, TTM can overcommit. But looking at the swapping implementation is a different issue. /Thomas > > Regards, > Christian. > > Am 07.02.2018 um 14:17 schrieb Thomas Hellstrom: >> Hi, Roger. >> >> On 02/07/2018 09:25 AM, He, Roger wrote: >>>     Why should TTM be different in that aspect? It would be good to >>> know your reasoning WRT this? >>> >>> Now, in TTM struct ttm_bo_device it already has member no_retry to >>> indicate your option. >>> If you prefer no OOM triggered by TTM, set it as true. The default >>> is false to keep original behavior. >>> AMD prefers return value of no memory rather than OOM for now. >> >> Understood, but why is that? I mean just because TTM doesn't invoke >> the OOM killer, that doesn't mean that the process will, the next >> millisecond, page in a number of pages and invoke it? So this >> mechanism would be pretty susceptible to races? >>>     One thing I looked at at one point was to have TTM do the >>> swapping itself instead of handing it off to the shmem system. That >>> way we could pre-allocate swap entries for all swappable (BO) >>> memory, making sure that we wouldn't run out of swap space when, >>> >>> I prefer current mechanism of swap out. At the beginning the swapped >>> pages stay in system memory by shmem until OS move to status with >>> high memory pressure, that has an obvious advantage. For example, if >>> the BO is swapped out into shmem, but not really be flushed into >>> swap disk. When validate it and swap in it at this moment, the >>> overhead is small compared to swap in from disk. >> >> But that is true for a page handed off to the swap-cache as well. It >> won't be immediately flushed to disc, only when the swap cache is >> shrunk. >> >>> In addition, No need swap space reservation for TTM pages when >>> allocation since swap disk is shared not only for TTM exclusive. >> >> That's true, but with VRAM, TTM overcommits swap space which may lead >> to ugly memory allocation failures at hibernate time. >> >>> So again we provide a flag no_retry in struct ttm_bo_device to let >>> driver set according to its request. >> >> Thanks, >> Thomas >> >> >>> >>> >>> Thanks >>> Roger(Hongbo.He) >>> >>> -----Original Message----- >>> From: Thomas Hellstrom [mailto:thomas at shipmail.org] >>> Sent: Wednesday, February 07, 2018 2:43 PM >>> To: He, Roger <Hongbo.He at amd.com>; amd-gfx at lists.freedesktop.org; >>> dri-devel at lists.freedesktop.org >>> Cc: Koenig, Christian <Christian.Koenig at amd.com> >>> Subject: Re: [PATCH 0/5] prevent OOM triggered by TTM >>> >>> Hi, Roger, >>> >>> On 02/06/2018 10:04 AM, Roger He wrote: >>>> currently ttm code has no any allocation limit. So it allows pages >>>> allocatation unlimited until OOM. Because if swap space is full of >>>> swapped pages and then system memory will be filled up with ttm pages. >>>> and then any memory allocation request will trigger OOM. >>>> >>> I'm a bit curious, isn't this the way things are supposed to work on >>> a linux system? >>> If all memory resources are used up, the OOM killer will kill the >>> most memory hungry (perhaps rogue) process rather than processes >>> being nice and try to find out themselves whether allocations will >>> succeed? >>> Why should TTM be different in that aspect? It would be good to know >>> your reasoning WRT this? >>> >>> Admittedly, graphics process OOM memory accounting doesn't work very >>> well, due to not all BOs not being CPU mapped, but it looks like >>> there is recent work towards fixing this? >>> >>> One thing I looked at at one point was to have TTM do the swapping >>> itself instead of handing it off to the shmem system. That way we >>> could pre-allocate swap entries for all swappable (BO) memory, >>> making sure that we wouldn't run out of swap space when, for >>> example, hibernating and that would also limit the pinned >>> non-swappable memory (from TTM driver kernel allocations for >>> example) to half the system memory resources. >>> >>> Thanks, >>> Thomas >>> >>