On 2020/8/26 下午8:48, Michal Hocko wrote: > On Wed 26-08-20 20:21:39, xunlei wrote: >> On 2020/8/26 下午8:07, Michal Hocko wrote: >>> On Wed 26-08-20 20:00:47, xunlei wrote: >>>> On 2020/8/26 下午7:00, Michal Hocko wrote: >>>>> On Wed 26-08-20 18:41:18, xunlei wrote: >>>>>> On 2020/8/26 下午4:11, Michal Hocko wrote: >>>>>>> On Wed 26-08-20 15:27:02, Xunlei Pang wrote: >>>>>>>> We've met softlockup with "CONFIG_PREEMPT_NONE=y", when >>>>>>>> the target memcg doesn't have any reclaimable memory. >>>>>>> >>>>>>> Do you have any scenario when this happens or is this some sort of a >>>>>>> test case? >>>>>> >>>>>> It can happen on tiny guest scenarios. >>>>> >>>>> OK, you made me more curious. If this is a tiny guest and this is a hard >>>>> limit reclaim path then we should trigger an oom killer which should >>>>> kill the offender and that in turn bail out from the try_charge lopp >>>>> (see should_force_charge). So how come this repeats enough in your setup >>>>> that it causes soft lockups? >>>>> >>>> >>>> should_force_charge() is false, the current trapped in endless loop is >>>> not the oom victim. >>> >>> How is that possible? If the oom killer kills a task and that doesn't >>> resolve the oom situation then it would go after another one until all >>> tasks are killed. Or is your task living outside of the memcg it tries >>> to charge? >>> >> >> All tasks are in memcgs. Looks like the first oom victim is not finished >> (unable to schedule), later mem_cgroup_oom()->...->oom_evaluate_task() >> will set oc->chosen to -1 and abort. > > This shouldn't be possible for too long because oom_reaper would > make it invisible to the oom killer so it should proceed. Also > mem_cgroup_out_of_memory takes a mutex and that is an implicit > scheduling point already. > > Which kernel version is this? > I reproduced it on "5.9.0-rc2". oom_reaper also can't get scheduled because of 1-cpu, and the mutex uses might_sleep() which is noop in case of "CONFIG_PREEMPT_VOLUNTARY is not set" I mentioned in the commit log. > And just for the clarification. I am not against the additional > cond_resched. That sounds like a good thing in general because we do > want to have a predictable scheduling during reclaim which is > independent on reclaimability as much as possible. But I would like to > drill down to why you are seeing the lockup because those shouldn't > really happen. >