On Fri, 21 Jun 2024 10:50:10 +0200, Michal Hocko wrote: > On Thu 20-06-24 19:30:19, Oleg Nesterov wrote: > > Can't review, I forgot everything about mm_update_next_owner(). > > So I am sorry for the noise I am going to add, feel free to ignore. > > Just in case, I see nothing wrong in this patch. > > > > On 06/20, alexjlzheng@xxxxxxxxx wrote: > > > > > > When mm_update_next_owner() is racing with swapoff (try_to_unuse()) or /proc or > > > ptrace or page migration (get_task_mm()), it is impossible to find an > > > appropriate task_struct in the loop whose mm_struct is the same as the target > > > mm_struct. > > > > > > If the above race condition is combined with the stress-ng-zombie and > > > stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in > > > write_lock_irq() for tasklist_lock. > > > > > > Recognize this situation in advance and exit early. > > > > But this patch won't help if (say) ptrace_access_vm() sleeps while > > for_each_process() tries to find another owner, right? > > > > > @@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm) > > > * Search through everything else, we should not get here often. > > > */ > > > for_each_process(g) { > > > + if (atomic_read(&mm->mm_users) <= 1) > > > + break; > > > > I think this deserves a comment to explain that this is optimization > > for the case we race with the pending mmput(). mm_update_next_owner() > > checks mm_users at the start. > > > > And. Can we drop tasklist and use rcu_read_lock() before for_each_process? > > Yes, this will probably need more changes even if possible... > > > > > > Or even better. Can't we finally kill mm_update_next_owner() and turn the > > ugly mm->owner into mm->mem_cgroup ? > > Yes, dropping the mm->owner should be a way to go. Replacing that by > mem_cgroup sounds like an improvemnt. I have a vague recollection that Sorry for the late reply. Replacing that by mem_cgroup maybe a good idea, a rcu lock looks good, too. But before the above optimization is implemented, I recommend using this patch to alleviate it. Both [PATCH] and [PATCH v2] are acceptable, they only differ in the commit log. Thanks for your reply. :) Jinliang Zheng > this has some traps on the way. E.g. tasks sharing the mm but living in > different cgroups. Things have changes since the last time I've checked > and for example memcg charge migration on task move will be deprecated > soon so chances are that there are less roadblocks on the way. > -- > Michal Hocko > SUSE Labs