Am 09.06.22 um 17:07 schrieb Michal Hocko:
On Thu 09-06-22 16:29:46, Christian König wrote:
[...]
Is that a show stopper? How should we address this?
This is a hard problem to deal with and I am not sure this simple
solution is really a good fit. Not only because of the memcg side of
things. I have my doubts that sparse files handling is ok as well.
Well I didn't claimed that this would be easy, we juts need to start
somewhere.
Regarding the sparse file handling, how about using
file->f_mapping->nrpages as badness for shmem files?
That should give us the real number of pages allocated through this
shmem file and gracefully handles sparse files.
I do realize this is a long term problem and there is a demand for some
solution at least. I am not sure how to deal with shared resources
myself. The best approximation I can come up with is to limit the scope
of the damage into a memcg context. One idea I was playing with (but
never convinced myself it is really a worth) is to allow a new mode of
the oom victim selection for the global oom event. It would be an opt in
and the victim would be selected from the biggest leaf memcg (or kill
the whole memcg if it has group_oom configured.
That would address at least some of the accounting issue because charges
are better tracked than per process memory consumption. It is a crude
and ugly hack and it doesn't solve the underlying problem as shared
resources are not guaranteed to be freed when processes die but maybe it
would be just slightly better than the existing scheme which is clearly
lacking behind existing userspace.
Well, what is so bad at the approach of giving each process holding a
reference to some shared memory it's equal amount of badness even when
the processes belong to different memory control groups?
If you really think that this would be a hard problem for upstreaming we
could as well keep the behavior for memcg as it is for now. We would
just need to adjust the paramters to oom_badness() a bit.
Regards,
Christian.