On Thu, Apr 25, 2024 at 09:10:16PM +0100, Matthew Wilcox wrote: > On Thu, Apr 25, 2024 at 01:37:40PM +0200, Pankaj Raghav (Samsung) wrote: > > From: Pankaj Raghav <p.raghav@xxxxxxxxxxx> > > > > Splitting a larger folio with a base order is supported using > > split_huge_page_to_list_to_order() API. However, using that API for LBS > > is resulting in an NULL ptr dereference error in the writeback path [1]. > > > > Refuse to split a folio if it has minimum folio order requirement until > > we can start using split_huge_page_to_list_to_order() API. Splitting the > > folio can be added as a later optimization. > > > > [1] https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df > > Obviously this has to be tracked down and fixed before this patchset can > be merged ... I think I have some ideas. Let me look a bit. How > would I go about reproducing this? I am able to reproduce it in a VM with 4G RAM and running generic/447 (sometimes you have to run it twice) on a 16K BS on a 4K PS system. I have a suspicion on this series: https://lore.kernel.org/linux-fsdevel/20240215063649.2164017-1-hch@xxxxxx/ but I am still unsure why this is happening when we split with LBS configurations. If you have kdevops installed, then go with Luis's suggestion, or else this is my local config. This is the diff I applied instead of this patch: diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9859aa4f7553..63ee7b6ed03d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3041,6 +3041,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, { struct folio *folio = page_folio(page); struct deferred_split *ds_queue = get_deferred_split_queue(folio); + unsigned int mapping_min_order = mapping_min_folio_order(folio->mapping); + + if (!folio_test_anon(folio)) + new_order = max_t(unsigned int, mapping_min_order, new_order); /* reset xarray order to new order after split */ XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order); struct anon_vma *anon_vma = NULL; @@ -3117,6 +3121,8 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, goto out; } + // XXX: Remove it later + VM_WARN_ON_FOLIO((new_order < mapping_min_order), folio); gfp = current_gfp_context(mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK); (END) xfstests is based on https://github.com/kdave/xfstests/tree/v2024.04.14 xfstests config: [default] FSTYP=xfs RESULT_BASE=/root/results/ DUMP_CORRUPT_FS=1 CANON_DEVS=yes RECREATE_TEST_DEV=true TEST_DEV=/dev/nvme0n1 TEST_DIR=/media/test SCRATCH_DEV=/dev/vdb SCRATCH_MNT=/media/scratch LOGWRITES_DEV=/dev/vdc [16k_4ks] MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1, -b size=16k, -s size=4k' [nix-shell:~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS vdb 254:16 0 32G 0 disk /media/scratch vdc 254:32 0 32G 0 disk nvme0n1 259:0 0 32G 0 disk /media/test $ ./check -s 16k_4ks generic/447 BT: [ 74.170698] BUG: KASAN: null-ptr-deref in filemap_get_folios_tag+0x14b/0x510 [ 74.170938] Write of size 4 at addr 0000000000000036 by task kworker/u16:6/284 [ 74.170938] [ 74.170938] CPU: 0 PID: 284 Comm: kworker/u16:6 Not tainted 6.9.0-rc4-00011-g4676d00b6f6f #7 [ 74.170938] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ 74.170938] Workqueue: writeback wb_workfn (flush-254:16) [ 74.170938] Call Trace: [ 74.170938] <TASK> [ 74.170938] dump_stack_lvl+0x51/0x70 [ 74.170938] kasan_report+0xab/0xe0 [ 74.170938] ? filemap_get_folios_tag+0x14b/0x510 [ 74.170938] kasan_check_range+0x35/0x1b0 [ 74.170938] filemap_get_folios_tag+0x14b/0x510 [ 74.170938] ? __pfx_filemap_get_folios_tag+0x10/0x10 [ 74.170938] ? srso_return_thunk+0x5/0x5f [ 74.170938] writeback_iter+0x508/0xcc0 [ 74.170938] ? __pfx_iomap_do_writepage+0x10/0x10 [ 74.170938] write_cache_pages+0x80/0x100 [ 74.170938] ? __pfx_write_cache_pages+0x10/0x10 [ 74.170938] ? srso_return_thunk+0x5/0x5f [ 74.170938] ? srso_return_thunk+0x5/0x5f [ 74.170938] ? srso_return_thunk+0x5/0x5f [ 74.170938] ? _raw_spin_lock+0x87/0xe0 [ 74.170938] iomap_writepages+0x85/0xe0 [ 74.170938] xfs_vm_writepages+0xe3/0x140 [xfs] [ 74.170938] ? __pfx_xfs_vm_writepages+0x10/0x10 [xfs] [ 74.170938] ? kasan_save_track+0x10/0x30 [ 74.170938] ? srso_return_thunk+0x5/0x5f [ 74.170938] ? __kasan_kmalloc+0x7b/0x90 [ 74.170938] ? srso_return_thunk+0x5/0x5f [ 74.170938] ? virtqueue_add_split+0x605/0x1b00 [ 74.170938] do_writepages+0x176/0x740 [ 74.170938] ? __pfx_do_writepages+0x10/0x10 [ 74.170938] ? __pfx_virtqueue_add_split+0x10/0x10 [ 74.170938] ? __pfx_update_sd_lb_stats.constprop.0+0x10/0x10 [ 74.170938] ? srso_return_thunk+0x5/0x5f [ 74.170938] ? virtqueue_add_sgs+0xfe/0x130 [ 74.170938] ? srso_return_thunk+0x5/0x5f [ 74.170938] ? virtblk_add_req+0x15c/0x280 [ 74.170938] __writeback_single_inode+0x9f/0x840 [ 74.170938] ? wbc_attach_and_unlock_inode+0x345/0x5d0 [ 74.170938] writeback_sb_inodes+0x491/0xce0 [ 74.170938] ? __pfx_wb_calc_thresh+0x10/0x10 [ 74.170938] ? __pfx_writeback_sb_inodes+0x10/0x10 [ 74.170938] ? __wb_calc_thresh+0x1a0/0x3c0 [ 74.170938] ? __pfx_down_read_trylock+0x10/0x10 [ 74.170938] ? wb_over_bg_thresh+0x16b/0x5e0 [ 74.170938] ? __pfx_move_expired_inodes+0x10/0x10 [ 74.170938] __writeback_inodes_wb+0xb7/0x200 [ 74.170938] wb_writeback+0x2c4/0x660 [ 74.170938] ? __pfx_wb_writeback+0x10/0x10 [ 74.170938] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 74.170938] wb_workfn+0x54e/0xaf0 [ 74.170938] ? srso_return_thunk+0x5/0x5f [ 74.170938] ? __pfx_wb_workfn+0x10/0x10 [ 74.170938] ? __pfx___schedule+0x10/0x10 [ 74.170938] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 74.170938] process_one_work+0x622/0x1020 [ 74.170938] worker_thread+0x844/0x10e0 [ 74.170938] ? srso_return_thunk+0x5/0x5f [ 74.170938] ? __kthread_parkme+0x82/0x150 [ 74.170938] ? __pfx_worker_thread+0x10/0x10 [ 74.170938] kthread+0x2b4/0x380 [ 74.170938] ? __pfx_kthread+0x10/0x10 [ 74.170938] ret_from_fork+0x30/0x70 [ 74.170938] ? __pfx_kthread+0x10/0x10 [ 74.170938] ret_from_fork_asm+0x1a/0x30 [ 74.170938] </TASK>