On Wed 25-05-22 12:22:06, Logan Gunthorpe wrote: > > > On 2022-05-25 03:04, Guoqing Jiang wrote: > > I would prefer to focus on block tree or md tree. With latest block tree > > (commit 44d8538d7e7dbee7246acda3b706c8134d15b9cb), I get below > > similar issue as Donald reported, it happened with the cmd (which did > > work with 5.12 kernel). > > > > vm79:~/mdadm> sudo ./test --dev=loop --tests=05r1-add-internalbitmap > > Ok, so this test passes for me, but my VM was not running with bfq. It > also seems we have layers upon layers of different bugs to untangle. > Perhaps you can try the tests with bfq disabled to make progress on the > other regression I reported. > > If I enable bfq and set the loop devices to the bfq scheduler, then I > hit the same bug as you and Donald. It's clearly a NULL pointer > de-reference in the bfq code, which seems to be triggered on the > partition read after mdadm opens a block device (not sure if it's the md > device or the loop device but I suspect the latter seeing it's not going > through any md code). > > Simplifying things down a bit, the null pointer dereference can be > triggered by creating an md device with loop devices that have bfq > scheduler set: > > mdadm --create --run /dev/md0 --level=1 -n2 /dev/loop0 /dev/loop1 > > The crash occurs in bfq_bio_bfqg() with blkg_to_bfqg() returning NULL. > It's hard to trace where the NULL comes from in there -- the code is a > bit complex. > > I've found that the bfq bug exists in current md-next (42b805af102) but > did not trigger in the base tag of v5.18-rc3. Bisecting revealed the bug > was introduced by: > > 4e54a2493e58 ("bfq: Get rid of __bio_blkcg() usage") > > Reverting that commit and the next commit (075a53b7) on top of md-next > was confirmed to fix the bug. > > I've copied Jan, Jens and Paolo who can hopefully help with this. A > cleaned up stack trace follows this email for their benefit. Indeed. Thanks for easy reproduction steps. I've reproduced the issue locally and I'm looking into it because from a quick look into the code this shouldn't be possible which means I'm missing something obvious :). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR