On Fri 10-08-18 19:54:39, Tetsuo Handa wrote: > On 2018/08/10 18:07, Michal Hocko wrote: > >> Yes, of course, it would be easy to rely on exit_mmap() to set > >> MMF_OOM_SKIP itself and have the oom reaper drop the task from its list > >> when we are assured of forward progress. What polling are you proposing > >> other than a timeout based mechanism to do this? > > > > I was thinking about doing something like the following > > - oom_reaper checks the amount of victim's memory after it is done with > > reaping (e.g. by calling oom_badness before and after). > > OK. We can apply > > +static inline unsigned long oom_victim_mm_score(struct mm_struct *mm) > +{ > + return get_mm_rss(mm) + get_mm_counter(mm, MM_SWAPENTS) + > + mm_pgtables_bytes(mm) / PAGE_SIZE; > +} > > and > > - points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + > - mm_pgtables_bytes(p->mm) / PAGE_SIZE; > + points = oom_victim_mm_score(p->mm); > > part, can't we? > > > If it wasn't able to > > reclaim much then return false and keep retrying with the existing > > mechanism > > How do you decide whether oom_reaper() was not able to reclaim much? Just a rule of thumb. If it freed at least few kBs then we should be good to MMF_OOM_SKIP. > > - once a flag (e.g. MMF_OOM_MMAP) is set it bails out and won't set the > > MMF_OOM_SKIP flag. > > Unless oom_victim_mm_score() becomes close to 0, setting MMF_OOM_SKIP is > considered premature. oom_reaper() will have to keep retrying.... there absolutely have to be a cap for retrying. Otherwise you have lockup scenarios back when the memory is mostly consumed by page tables. > >> We could set a MMF_EXIT_MMAP in exit_mmap() to specify that it will > >> complete free_pgtables() for that mm. The problem is the same: when does > >> the oom reaper decide to set MMF_OOM_SKIP because MMF_EXIT_MMAP has not > >> been set in a timely manner? > > > > reuse the current retry policy which is the number of attempts rather > > than any timeout. > > And this is really I can't understand. The number of attempts multiplied > by retry interval _is_ nothing but timeout. Yes it is a timeout but it is not the time that matters. It is that we have tried sufficient times. Looks at it this way. You can retry 5 times in 10s or just once. Depending on what is going on in the system. I would really prefer the behavior to be deterministic. > We are already using timeout based decision, with some attempt to reclaim > memory if conditions are met. Timeout based decision is when you, well, make a decision after a certain time passes. And we do not do that. > >> If this is an argument that the oom reaper should loop checking for > >> MMF_EXIT_MMAP and doing schedule_timeout(1) a set number of times rather > >> than just setting the jiffies in the mm itself, that's just implementing > >> the same thing and doing so in a way where the oom reaper stalls operating > >> on a single mm rather than round-robin iterating over mm's in my patch. > > > > I've said earlier that I do not mind doing round robin in the oom repaer > > but this is certainly more complex than what we do now and I haven't > > seen any actual example where it would matter. OOM reaper is a safely > > measure. Nothing should fall apart if it is slow. > > The OOM reaper can fail if allocating threads have high priority. You seem to > assume that realtime threads won't trigger OOM path. But since !PF_WQ_WORKER > threads do only cond_resched() due to your "the cargo cult programming" refusal, > and like Andrew Morton commented > > cond_resched() is a no-op in the presence of realtime policy threads > and using to attempt to yield to a different thread it in this fashion > is broken. > > at "mm: disable preemption before swapcache_free" thread, we can't guarantee > that allocating threads shall give the OOM reaper enough CPU resource for > making forward progress. And my direct OOM reaping proposal was also refused > by you. I really dislike counting OOM reaper as a safety measure. Well, yeah, you can screw up your system with real time priority tasks all you want. I really fail to see why you are bringing that up now though. Yet another offtopic? -- Michal Hocko SUSE Labs