Re: [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration

Zhaoyang Huang <huangzhaoyang@xxxxxxxxx> · Mon, 15 Apr 2024 09:50:19 +0800

On Mon, Apr 15, 2024 at 8:09 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Sat, Apr 13, 2024 at 10:01:27AM +0800, Zhaoyang Huang wrote:
> > loop Dave, since he has ever helped set up an reproducer in
> > https://lore.kernel.org/linux-mm/20221101071721.GV2703033@xxxxxxxxxxxxxxxxxxx/
> > @Dave Chinner , I would like to ask for your kindly help on if you can
> > verify this patch on your environment if convenient. Thanks a lot.
>
> I don't have the test environment from 18 months ago available any
> more. Also, I haven't seen this problem since that specific test
> environment tripped over the issue. Hence I don't have any way of
> confirming that the problem is fixed, either, because first I'd have
> to reproduce it...
Thanks for the information. I noticed that you reported another soft
lockup which is related to xas_load since NOV.2023. This patch is
supposed to be helpful for this. With regard to the version timing,
this commit is actually a revert of <mm/thp: narrow lru locking>
b6769834aac1d467fa1c71277d15688efcbb4d76 which is merged before v5.15.

For saving your time, a brief description below. IMO, b6769834aa
introduce a potential stall between freeze the folio's refcnt and
store it back to 2, which have the xas_load->folio_try_get_rcu loops
as livelock if it stalls the lru_lock's holder.

b6769834aa
    split_huge_page_to_list
-       spin_lock(lru_lock)
        xas_split(&xas, folio,order)
        folio_refcnt_freeze(folio, 1 + folio_nr_pages(folio0)
+      spin_lock(lru_lock)
        xas_store(&xas, offset++, head+i)
        page_ref_add(head, 2)
        spin_unlock(lru_lock)

Sorry in advance if the above doesn't make sense, I am just a
developer who is also suffering from this bug and trying to fix it
>
> -Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx