Re: [PATCH v4 05/11] mm: do not split a folio if it has minimum folio order requirement

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Sun, 28 Apr 2024 20:56:16 -0700

On Sat, Apr 27, 2024 at 05:57:17PM -0700, Luis Chamberlain wrote:
> On Fri, Apr 26, 2024 at 04:46:11PM -0700, Luis Chamberlain wrote:
> > On Thu, Apr 25, 2024 at 05:47:28PM -0700, Luis Chamberlain wrote:
> > > On Thu, Apr 25, 2024 at 09:10:16PM +0100, Matthew Wilcox wrote:
> > > > On Thu, Apr 25, 2024 at 01:37:40PM +0200, Pankaj Raghav (Samsung) wrote:
> > > > > From: Pankaj Raghav <p.raghav@xxxxxxxxxxx>
> > > > > 
> > > > > using that API for LBS is resulting in an NULL ptr dereference
> > > > > error in the writeback path [1].
> > > > >
> > > > > [1] https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
> > > > 
> > > >  How would I go about reproducing this?
> 
> Well so the below fixes this but I am not sure if this is correct.
> folio_mark_dirty() at least says that a folio should not be truncated
> while its running. I am not sure if we should try to split folios then
> even though we check for writeback once. truncate_inode_partial_folio()
> will folio_wait_writeback() but it will split_folio() before checking
> for claiming to fail to truncate with folio_test_dirty(). But since the
> folio is locked its not clear why this should be possible.
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 83955362d41c..90195506211a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3058,7 +3058,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>  	if (new_order >= folio_order(folio))
>  		return -EINVAL;
>  
> -	if (folio_test_writeback(folio))
> +	if (folio_test_dirty(folio) || folio_test_writeback(folio))
>  		return -EBUSY;
>  
>  	if (!folio_test_anon(folio)) {

I wondered what code path is causing this and triggering this null
pointer, so I just sprinkled a check here:

	VM_BUG_ON_FOLIO(folio_test_dirty(folio), folio);

The answer was:

kcompactd() --> migrate_pages_batch()
                  --> try_split_folio --> split_folio_to_list() -->
		       split_huge_page_to_list_to_order()

Since it took running fstests generic/447 twice to reproduce the crash
with a minorder and 16k block size, modified generic/447 as followed and
it helps to speed up the reproducer with just running the test once:

diff --git a/tests/generic/447 b/tests/generic/447
index 16b814ee7347..43050b58e8ba 100755
--- a/tests/generic/447
+++ b/tests/generic/447
@@ -36,6 +39,15 @@ _scratch_mount >> "$seqres.full" 2>&1
 testdir="$SCRATCH_MNT/test-$seq"
 mkdir "$testdir"
 
+runfile="$tmp.compaction"
+touch $runfile
+while [ -e $runfile ]; do
+	echo 1 > /proc/sys/vm/compact_memory
+	sleep 10
+done &
+compaction_pid=$!
+
+
 # Setup for one million blocks, but we'll accept stress testing down to
 # 2^17 blocks... that should be plenty for anyone.
 fnr=20
@@ -69,6 +81,8 @@ _scratch_cycle_mount
 echo "Delete file1"
 rm -rf $testdir/file1
 
+rm -f $runfile
+wait > /dev/null 2>&1
 # success, all done
 status=0
 exit

And I verified that moving the check only to the migrate_pages_batch()
path also fixes the crash:

diff --git a/mm/migrate.c b/mm/migrate.c
index 73a052a382f1..83b528eb7100 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1484,7 +1484,12 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f
 	int rc;
 
 	folio_lock(folio);
+	if (folio_test_dirty(folio)) {
+		rc = -EBUSY;
+		goto out;
+	}
 	rc = split_folio_to_list(folio, split_folios);
+out:
 	folio_unlock(folio);
 	if (!rc)
 		list_move_tail(&folio->lru, split_folios);

However I'd like compaction folks to review this. I see some indications
in the code that migration can race with truncation but we feel fine by
it by taking the folio lock. However here we have a case where we see
the folio clearly locked and the folio is dirty. Other migraiton code
seems to write back the code and can wait, here we just move on. Further
reading on commit 0003e2a414687 ("mm: Add AS_UNMOVABLE to mark mapping
as completely unmovable") seems to hint that migration is safe if the
mapping either does not exist or the mapping does exist but has
mapping->a_ops->migrate_folio so I'd like further feedback on this.
Another thing which requires review is if we we split a folio but not
down to order 0 but to the new min order, does the accounting on
migrate_pages_batch() require changing?  And most puzzling, why do we
not see this with regular large folios, but we do see it with minorder ?

  Luis