+ dri-devel On Tue, May 31, 2022 at 6:00 AM Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > Hello everyone, > > To summarize the issue I'm trying to address here: Processes can allocate > resources through a file descriptor without being held responsible for it. > > Especially for the DRM graphics driver subsystem this is rather > problematic. Modern games tend to allocate huge amounts of system memory > through the DRM drivers to make it accessible to GPU rendering. > > But even outside of the DRM subsystem this problem exists and it is > trivial to exploit. See the following simple example of > using memfd_create(): > > fd = memfd_create("test", 0); > while (1) > write(fd, page, 4096); > > Compile this and you can bring down any standard desktop system within > seconds. > > The background is that the OOM killer will kill every processes in the > system, but just not the one which holds the only reference to the memory > allocated by the memfd. > > Those problems where brought up on the mailing list multiple times now > [1][2][3], but without any final conclusion how to address them. Since > file descriptors are considered shared the process can not directly held > accountable for allocations made through them. Additional to that file > descriptors can also easily move between processes as well. > > So what this patch set does is to instead of trying to account the > allocated memory to a specific process it adds a callback to struct > file_operations which the OOM killer can use to query the specific OOM > badness of this file reference. This badness is then divided by the > file_count, so that every process using a shmem file, DMA-buf or DRM > driver will get it's equal amount of OOM badness. > > Callbacks are then implemented for the two core users (memfd and DMA-buf) > as well as 72 DRM based graphics drivers. > > The result is that the OOM killer can now much better judge if a process > is worth killing to free up memory. Resulting a quite a bit better system > stability in OOM situations, especially while running games. > > The only other possibility I can see would be to change the accounting of > resources whenever references to the file structure change, but this would > mean quite some additional overhead for a rather common operation. > > Additionally I think trying to limit device driver allocations using > cgroups is orthogonal to this effort. While cgroups is very useful, it > works on per process limits and tries to enforce a collaborative model on > memory management while the OOM killer enforces a competitive model. > > Please comment and/or review, we have that problem flying around for years > now and are not at a point where we finally need to find a solution for > this. > > Regards, > Christian. > > [1] https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html > [2] https://lkml.org/lkml/2018/1/18/543 > [3] https://lkml.org/lkml/2021/2/4/799 > >