On Mon, Apr 29, 2024 at 10:29:29AM -0400, Zi Yan wrote: > On 28 Apr 2024, at 23:56, Luis Chamberlain wrote: > > > On Sat, Apr 27, 2024 at 05:57:17PM -0700, Luis Chamberlain wrote: > >> On Fri, Apr 26, 2024 at 04:46:11PM -0700, Luis Chamberlain wrote: > >>> On Thu, Apr 25, 2024 at 05:47:28PM -0700, Luis Chamberlain wrote: > >>>> On Thu, Apr 25, 2024 at 09:10:16PM +0100, Matthew Wilcox wrote: > >>>>> On Thu, Apr 25, 2024 at 01:37:40PM +0200, Pankaj Raghav (Samsung) wrote: > >>>>>> From: Pankaj Raghav <p.raghav@xxxxxxxxxxx> > >>>>>> > >>>>>> using that API for LBS is resulting in an NULL ptr dereference > >>>>>> error in the writeback path [1]. > >>>>>> > >>>>>> [1] https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df > >>>>> > >>>>> How would I go about reproducing this? > >> > >> Well so the below fixes this but I am not sure if this is correct. > >> folio_mark_dirty() at least says that a folio should not be truncated > >> while its running. I am not sure if we should try to split folios then > >> even though we check for writeback once. truncate_inode_partial_folio() > >> will folio_wait_writeback() but it will split_folio() before checking > >> for claiming to fail to truncate with folio_test_dirty(). But since the > >> folio is locked its not clear why this should be possible. > >> > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >> index 83955362d41c..90195506211a 100644 > >> --- a/mm/huge_memory.c > >> +++ b/mm/huge_memory.c > >> @@ -3058,7 +3058,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > >> if (new_order >= folio_order(folio)) > >> return -EINVAL; > >> > >> - if (folio_test_writeback(folio)) > >> + if (folio_test_dirty(folio) || folio_test_writeback(folio)) > >> return -EBUSY; > >> > >> if (!folio_test_anon(folio)) { > > > > I wondered what code path is causing this and triggering this null > > pointer, so I just sprinkled a check here: > > > > VM_BUG_ON_FOLIO(folio_test_dirty(folio), folio); > > > > The answer was: > > > > kcompactd() --> migrate_pages_batch() > > --> try_split_folio --> split_folio_to_list() --> > > split_huge_page_to_list_to_order() > > > > There are 3 try_split_folio() in migrate_pages_batch(). This is only true for linux-next, for v6.9-rc5 off of which this testing is based on there are only two. > First one is to split anonymous large folios that are on deferred > split list, so not related; This is in linux-next and not v6.9-rc5. > second one is to split THPs when thp migration is not supported, but > this is compaction, so not related; third one is to split large folios > when there is no same size free page in the system, and this should be > the one. Agreed, the case where migrate_folio_unmap() failed with -ENOMEM. This also helps us enhance the reproducer further, which I'll do next. > > And I verified that moving the check only to the migrate_pages_batch() > > path also fixes the crash: > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > index 73a052a382f1..83b528eb7100 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -1484,7 +1484,12 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f > > int rc; > > > > folio_lock(folio); > > + if (folio_test_dirty(folio)) { > > + rc = -EBUSY; > > + goto out; > > + } > > rc = split_folio_to_list(folio, split_folios); > > +out: > > folio_unlock(folio); > > if (!rc) > > list_move_tail(&folio->lru, split_folios); > > > > However I'd like compaction folks to review this. I see some indications > > in the code that migration can race with truncation but we feel fine by > > it by taking the folio lock. However here we have a case where we see > > the folio clearly locked and the folio is dirty. Other migraiton code > > seems to write back the code and can wait, here we just move on. Further > > reading on commit 0003e2a414687 ("mm: Add AS_UNMOVABLE to mark mapping > > as completely unmovable") seems to hint that migration is safe if the > > mapping either does not exist or the mapping does exist but has > > mapping->a_ops->migrate_folio so I'd like further feedback on this. > > During migration, all page table entries pointing to this dirty folio > are invalid, and accesses to this folio will cause page fault and > wait on the migration entry. I am not sure we need to skip dirty folios. I see.. thanks! > > Another thing which requires review is if we we split a folio but not > > down to order 0 but to the new min order, does the accounting on > > migrate_pages_batch() require changing? And most puzzling, why do we > > What accounting are you referring to? split code should take care of it. The folio order can change after split, and so I was concerned about the nr_pages used in migrate_pages_batch(). But I see now that when migrate_folio_unmap() first failed we try to split the folio, and if successful I see now we the caller will again call migrate_pages_batch() with a retry attempt of 1 only to the split folios. I also see the nr_pages is just local to each list for each loop, first on the from list to unmap and afte on the unmap list so we move the folios. > > not see this with regular large folios, but we do see it with minorder ? > > I wonder if the split code handles folio->mapping->i_pages properly. > Does the i_pages store just folio pointers or also need all tail page > pointers? I am no expert in fs, thus need help. mapping->i_pages stores folio pointers in the page cache or swap/dax/shadow entries (xa_is_value(folio)). The folios however can be special and we special-case them with shmem_mapping(mapping) checks. split_huge_page_to_list_to_order() doens't get called with swap/dax/shadow entries, and we also bail out on shmem_mapping(mapping) already. Luis