On Tue, Aug 22, 2023 at 11:41:41AM +0800, Tong Tiangen wrote: > 在 2023/8/22 2:33, Matthew Wilcox 写道: > > On Mon, Aug 21, 2023 at 05:13:12PM +0800, Tong Tiangen wrote: > > > We can see that CPU1 waiting for CPU0 respond IPI,CPU0 waiting for CPU2 > > > unlock tasklist_lock, CPU2 waiting for CPU1 unlock page->ptl. As a result, > > > softlockup is triggered. > > > > > > For collect_procs_anon(), we will not modify the tasklist, but only perform > > > read traversal. Therefore, we can use rcu lock instead of spin lock > > > tasklist_lock, from this, we can break the softlock chain above. > > > > The only thing that's giving me pause is that there's no discussion > > about why this is safe. "We're not modifying it" isn't really enough > > to justify going from read_lock() to rcu_read_lock(). When you take a > > normal read_lock(), writers are not permitted and so you see an atomic > > snapshot of the list. With rcu_read_lock() you can see inconsistencies. > > Hi Matthew: > > When rcu_read_lock() is used, the task list can be modified during the > iteration, but cannot be seen during iteration. After the iteration is > complete, the task list can be updated in the RCU mechanism. Therefore, the > task list used by iteration can also be considered as a snapshot. No, that's not true! You are not iterating a snapshot of the list, you're iterating the live list. It will change under you. RCU provides you with some guarantees about that list. See Documentation/RCU/listRCU.rst > > For example, if new tasks can be added to the tasklist, they may not > > be seen by an iteration. Is this OK? > > The newly added tasks does not access the HWPoison page, because the > HWPoison page has been isolated from the > buddy(memory_failure()->take_page_off_buddy()). Therefore, it is safe to see > the newly added task during the iteration and not be seen by iteration. > > Tasks may be removed from the > > tasklist after they have been seen by the iteration. Is this OK? > > Task be seen during iteration are deleted from the task list after > iteration, it's task_struct is not released because reference counting is > added in __add_to_kill(). Therefore, the subsequent processing of > kill_procs() is not affected (sending signals to the task deleted from task > list). so i think it's safe too. I don't know this code, but it seems unsafe to me. Look: collect_procs_anon: for_each_process(tsk) { struct task_struct *t = task_early_kill(tsk, force_early); add_to_kill_anon_file(t, page, vma, to_kill); add_to_kill_anon_file: __add_to_kill(tsk, p, vma, to_kill, 0, FSDAX_INVALID_PGOFF); __add_to_kill: get_task_struct(tsk); static inline struct task_struct *get_task_struct(struct task_struct *t) { refcount_inc(&t->usage); return t; } /** * refcount_inc - increment a refcount * @r: the refcount to increment * * Similar to atomic_inc(), but will saturate at REFCOUNT_SATURATED and WARN. * * Provides no memory ordering, it is assumed the caller already has a * reference on the object. * * Will WARN if the refcount is 0, as this represents a possible use-after-free * condition. */ I don't see anything that prevents that refcount_inc from seeing a zero refcount. Usually that would be prevented by tasklist_lock, right? Andrew, I think this patch is bad and needs to be dropped.