On Mon, Apr 15, 2024 at 8:09 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Sat, Apr 13, 2024 at 10:01:27AM +0800, Zhaoyang Huang wrote: > > loop Dave, since he has ever helped set up an reproducer in > > https://lore.kernel.org/linux-mm/20221101071721.GV2703033@xxxxxxxxxxxxxxxxxxx/ > > @Dave Chinner , I would like to ask for your kindly help on if you can > > verify this patch on your environment if convenient. Thanks a lot. > > I don't have the test environment from 18 months ago available any > more. Also, I haven't seen this problem since that specific test > environment tripped over the issue. Hence I don't have any way of > confirming that the problem is fixed, either, because first I'd have > to reproduce it... Thanks for the information. I noticed that you reported another soft lockup which is related to xas_load since NOV.2023. This patch is supposed to be helpful for this. With regard to the version timing, this commit is actually a revert of <mm/thp: narrow lru locking> b6769834aac1d467fa1c71277d15688efcbb4d76 which is merged before v5.15. For saving your time, a brief description below. IMO, b6769834aa introduce a potential stall between freeze the folio's refcnt and store it back to 2, which have the xas_load->folio_try_get_rcu loops as livelock if it stalls the lru_lock's holder. b6769834aa split_huge_page_to_list - spin_lock(lru_lock) xas_split(&xas, folio,order) folio_refcnt_freeze(folio, 1 + folio_nr_pages(folio0) + spin_lock(lru_lock) xas_store(&xas, offset++, head+i) page_ref_add(head, 2) spin_unlock(lru_lock) Sorry in advance if the above doesn't make sense, I am just a developer who is also suffering from this bug and trying to fix it > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx