On Mon, Mar 25, 2024 at 11:22 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Sun, Mar 24, 2024 at 07:14:27PM +0800, Zhaoyang Huang wrote: > > ok. It seems like madvise is robust enough to leave no BUGs. I presume > > another two scenarios which call folio_isloate_lru by any other ways > > but PTE. Besides, scenario 2 reminds me of a previous bug reported by > > me as find_get_entry entered in a livelock where the folio's refcnt == > > 0 but remains at xarray which causes the reset->retry loops forever. I > > would like to reply in that thread for more details. > > > > Scenario 1: > > 0. Thread_bad gets the folio by find_get_entry and preempted before > > folio_lock (could be the second round scan of > > truncate_inode_pages_range) > > refcnt == 2(page_cache, fbatch_bad), PG_lru == true, PG_lock == false > > folio = find_get_entry > > folio_try_get_rcu > > <preempted> > > folio_try_lock > > > > 1. Thread_truncate get the folio via > > truncate_inode_pages_range->find_lock_entries > > refcnt == 3(page_cache, fbatch_bad, fbatch_truncate), PG_lru == > > true, PG_lock == true > > Hang on, you can't have two threads in truncate_inode_pages_range() > at the same time. I appreciate that we don't have any documentation > of that, but if it were possible, we'd see other crashes. Removing > the folio from the page cache sets folio->mapping to NULL. And > __filemap_remove_folio() uses folio->mapping in > filemap_unaccount_folio() and page_cache_delete(), so we'd get NULL > pointer dereferences. ok. I will check if it is possible to have another way of entering this scenario. > > I see a hint in the DAX code that it's an fs-dependent lock: > > /* > * This gets called from truncate / punch_hole path. As such, the caller > * must hold locks protecting against concurrent modifications of the > * page cache (usually fs-private i_mmap_sem for writing). Since the > * caller has seen a DAX entry for this index, we better find it > * at that index as well... > */ > > so maybe that's why there's no lockdep_assert() in > truncate_inode_pages_range(), but there should be a comment. > > > Scenario 2: > > 0. Thread_bad gets the folio by find_get_entry and preempted before > > folio_lock (could be the second round scan of > > truncate_inode_pages_range) > > refcnt == 2(page_cache, fbatch_bad), PG_lru == true, PG_lock == false > > folio = find_get_entry > > folio_try_get_rcu > > <preempted> > > folio_try_lock > > > > 1. Thread_readahead remove the folio from page cache and drop one > > refcnt by filemap_remove_folio(get rid of the folios which failed to > > launch IO during readahead) > > refcnt == 1(fbatch_bad), PG_lru == true, PG_lock == true > > So readaahead inserts the folio locked, and then calls > filemap_remove_folio() without having unlocked it. > filemap_remove_folio() sets folio->mapping to NULL in > page_cache_delete(). When "Thread_bad" wakes up, it gets the > folio lock, calls truncate_inode_folio() and sees that > folio->mapping != mapping, so it doesn't call filemap_remove_folio(). > > > 4. Thread_bad schedule back from step 0 and clear one refcnt wrongly > > when doing truncate_inode_folio->filemap_remove_folio as it take this > > refcnt as the page cache one > > refcnt == 1'(thread_isolate), PG_lru == false, PG_lock == false > > find_get_entries > > folio = find_get_entry > > folio_try_get_rcu > > folio_lock > > <no check as folio->mapping != mapping as folio_lock_entries does> > > truncate_inode_folio > > filemap_remove_folio > > <preempted> Please review the following scenario, where the folio dropped two refcnt wrongly when cleaning Non-IO folios within ractl. Should we change it to __readahead_folio? 0. Thread_bad gets the folio by find_get_entry and preempted after folio_try_get_rcu (could be the second round scan of truncate_inode_pages_range) refcnt == 2(page_cache, fbatch_bad), PG_lru == true, PG_lock == false folio = find_get_entry folio_try_get_rcu <preempted> 1. Thread_readahead remove the folio from page cache and drop 2 refcnt by readahead_folio & filemap_remove_folio(get rid of the folios which failed to launch IO during readahead) refcnt == 0, PG_lru == true, PG_lock == true read_pages ... folio = readahead_folio <one refcnt dropped here> ********For the folio which can not launch IO, we should NOT drop refcnt here??? replaced by __readahead_folio???********** folio_get filemap_remove_folio(folio) folio_unlock <one refcnt dropped here> folio_put 2. folio_unlock refcnt == 0, PG_lru == true, PG_lock == false 3. Thread_isolate get one refcnt and call folio_isolate_lru(could be any process) refcnt == 1(thread_isolate), PG_lru == true, PG_lock == false 4. Thread_isolate proceed to clear PG_lru and get preempted before folio_get refcnt == 1(thread_isolate), PG_lru == false, PG_lock == false folio_test_clear_folio <preempted> folio_get 5. Thread_bad schedule back from step 0 and proceed to drop one refcnt by release_pages and hit the BUG refcnt == 0'(thread_isolate), PG_lru == false, PG_lock == false