On 7/19/2023 10:26 PM, Hugh Dickins wrote: > On Wed, 19 Jul 2023, Yin Fengwei wrote: >>>>>>>>>>> Could this also happen against normal 4K page? I mean when user try to munlock >>>>>>>>>>> a normal 4K page and this 4K page is isolated. So it become unevictable page? >>>>>>>>>> Looks like it can be possible. If cpu 1 is in __munlock_folio() and >>>>>>>>>> cpu 2 is isolating the folio for any purpose: >>>>>>>>>> >>>>>>>>>> cpu1 cpu2 >>>>>>>>>> isolate folio >>>>>>>>>> folio_test_clear_lru() // 0 >>>>>>>>>> putback folio // add to unevictable list >>>>>>>>>> folio_test_clear_mlocked() >>>>>>> folio_set_lru() >> Let's wait the response from Huge and Yu. :). > > I haven't been able to give it enough thought, but I suspect you are right: > that the current __munlock_folio() is deficient when folio_test_clear_lru() > fails. > > (Though it has not been reported as a problem in practice: perhaps because > so few places try to isolate from the unevictable "list".) > > I forget what my order of development was, but it's likely that I first > wrote the version for our own internal kernel - which used our original > lruvec locking, which did not depend on getting PG_lru first (having got > lru_lock, it checked memcg, then tried again if that had changed). > > I was uneasy with the PG_lru aspect of upstream lru_lock implementation, > but it turned out to work okay - elsewhere; but it looks as if I missed > its implication when adapting __munlock_page() for upstream. > > If I were trying to fix this __munlock_folio() race myself (sorry, I'm > not), I would first look at that aspect: instead of folio_test_clear_lru() > behaving always like a trylock, could "folio_wait_clear_lru()" or whatever > spin waiting for PG_lru here? Considering following sequence: CPU1 (migration) CPU2 (mlock) isolation page (clear lru) mlock_pte_range try_to_migrate -> take_pte_lock try_to_migrate_one munlock_folio pvmw -> take pte lock __munlock_folio if batch full folio_wait_clear_lru deadlock may happen. Regards Yin, Fengwei > > Hugh