On Tue, Apr 20, 2021 at 7:58 PM Roman Gushchin <guro@xxxxxx> wrote: > [...] > > > > Michal has suggested ALLOC_OOM which is less risky. > > The problem is that even if you'll serve the oom daemon task with pages > from a reserve/custom pool, it doesn't guarantee anything, because the task > still can wait for a long time on some mutex, taken by another process, > throttled somewhere in the reclaim. I am assuming here by mutex you are referring to locks which oom-killer might have to take to read metrics or any possible lock which oom-killer might have to take which some other process can take too. Have you observed this situation happening with oomd on production? > You're basically trying to introduce a > "higher memory priority" and as always in such cases there will be priority > inversion problems. > > So I doubt that you can simple create a common mechanism which will work > flawlessly for all kinds of allocations, I anticipate many special cases > requiring an individual approach. > [...] > > First, I need to admit that I didn't follow the bpf development too close > for last couple of years, so my knowledge can be a bit outdated. > > But in general bpf is great when there is a fixed amount of data as input > (e.g. skb) and a fixed output (e.g. drop/pass the packet). There are different > maps which are handy to store some persistent data between calls. > > However traversing complex data structures is way more complicated. It's > especially tricky if the data structure is not of a fixed size: bpf programs > have to be deterministic, so there are significant constraints on loops. > > Just for example: it's easy to call a bpf program for each task in the system, > provide some stats/access to some fields of struct task and expect it to return > an oom score, which then the kernel will look at to select the victim. > Something like this can be done with cgroups too. > > Writing a kthread, which can sleep, poll some data all over the system and > decide what to do (what oomd/... does), will be really challenging. > And going back, it will not provide any guarantees unless we're not taking > any locks, which is already quite challenging. > Thanks for the info and I agree this direction needs much more thought and time to be materialized. thanks, Shakeel