On 2022-05-25 03:04, Guoqing Jiang wrote: > I would prefer to focus on block tree or md tree. With latest block tree > (commit 44d8538d7e7dbee7246acda3b706c8134d15b9cb), I get below > similar issue as Donald reported, it happened with the cmd (which did > work with 5.12 kernel). > > vm79:~/mdadm> sudo ./test --dev=loop --tests=05r1-add-internalbitmap Ok, so this test passes for me, but my VM was not running with bfq. It also seems we have layers upon layers of different bugs to untangle. Perhaps you can try the tests with bfq disabled to make progress on the other regression I reported. If I enable bfq and set the loop devices to the bfq scheduler, then I hit the same bug as you and Donald. It's clearly a NULL pointer de-reference in the bfq code, which seems to be triggered on the partition read after mdadm opens a block device (not sure if it's the md device or the loop device but I suspect the latter seeing it's not going through any md code). Simplifying things down a bit, the null pointer dereference can be triggered by creating an md device with loop devices that have bfq scheduler set: mdadm --create --run /dev/md0 --level=1 -n2 /dev/loop0 /dev/loop1 The crash occurs in bfq_bio_bfqg() with blkg_to_bfqg() returning NULL. It's hard to trace where the NULL comes from in there -- the code is a bit complex. I've found that the bfq bug exists in current md-next (42b805af102) but did not trigger in the base tag of v5.18-rc3. Bisecting revealed the bug was introduced by: 4e54a2493e58 ("bfq: Get rid of __bio_blkcg() usage") Reverting that commit and the next commit (075a53b7) on top of md-next was confirmed to fix the bug. I've copied Jan, Jens and Paolo who can hopefully help with this. A cleaned up stack trace follows this email for their benefit. Logan -- BUG: KASAN: null-ptr-deref in bfq_bio_bfqg+0x65/0xf0 Read of size 1 at addr 0000000000000094 by task mdadm/850 CPU: 1 PID: 850 Comm: mdadm Not tainted 5.18.0-rc3-eid-vmlocalyes-dbg-00005-g42b805af1024 #2113 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x5a/0x74 kasan_report.cold+0x5f/0x1a9 __asan_load1+0x4d/0x50 bfq_bio_bfqg+0x65/0xf0 bfq_bic_update_cgroup+0x2f/0x340 bfq_insert_requests+0x568/0x5800 blk_mq_sched_insert_request+0x180/0x230 blk_mq_submit_bio+0x9f0/0xe50 __submit_bio+0xeb/0x100 submit_bio_noacct_nocheck+0x1fd/0x470 submit_bio_noacct+0x350/0xa80 submit_bio+0x84/0xf0 submit_bh_wbc+0x27a/0x2b0 block_read_full_page+0x578/0xb60 blkdev_readpage+0x18/0x20 do_read_cache_folio+0x290/0x430 read_cache_page+0x41/0x130 read_part_sector+0x7a/0x3d0 read_lba+0x161/0x340 efi_partition+0x1ce/0xdd0 bdev_disk_changed+0x2e9/0x6a0 blkdev_get_whole+0xd5/0x140 blkdev_get_by_dev.part.0+0x37f/0x570 blkdev_get_by_dev+0x51/0x60 blkdev_open+0xa4/0x140 do_dentry_open+0x2a7/0x6d0 vfs_open+0x58/0x60 path_openat+0x77e/0x13f0 do_filp_open+0x154/0x280 do_sys_openat2+0x119/0x2c0 __x64_sys_openat+0xe7/0x160 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -- bfq_bio_bfqg+0x65/0xf0: bfq_bio_bfqg at block/bfq-cgroup.c:619 614 struct blkcg_gq *blkg = bio->bi_blkg; 615 struct bfq_group *bfqg; 616 617 while (blkg) { 618 bfqg = blkg_to_bfqg(blkg); >619< if (bfqg->online) { 620 bio_associate_blkg_from_css(bio, 621 return bfqg; 622 } 623 blkg = blkg->parent; 624