[Added couple of CCs since this seems to be an issue in the generic block layer] On Wed 25-05-22 12:22:06, Logan Gunthorpe wrote: > On 2022-05-25 03:04, Guoqing Jiang wrote: > > I would prefer to focus on block tree or md tree. With latest block tree > > (commit 44d8538d7e7dbee7246acda3b706c8134d15b9cb), I get below > > similar issue as Donald reported, it happened with the cmd (which did > > work with 5.12 kernel). > > > > vm79:~/mdadm> sudo ./test --dev=loop --tests=05r1-add-internalbitmap > > Ok, so this test passes for me, but my VM was not running with bfq. It > also seems we have layers upon layers of different bugs to untangle. > Perhaps you can try the tests with bfq disabled to make progress on the > other regression I reported. > > If I enable bfq and set the loop devices to the bfq scheduler, then I > hit the same bug as you and Donald. It's clearly a NULL pointer > de-reference in the bfq code, which seems to be triggered on the > partition read after mdadm opens a block device (not sure if it's the md > device or the loop device but I suspect the latter seeing it's not going > through any md code). > > Simplifying things down a bit, the null pointer dereference can be > triggered by creating an md device with loop devices that have bfq > scheduler set: > > mdadm --create --run /dev/md0 --level=1 -n2 /dev/loop0 /dev/loop1 > > The crash occurs in bfq_bio_bfqg() with blkg_to_bfqg() returning NULL. > It's hard to trace where the NULL comes from in there -- the code is a > bit complex. > > I've found that the bfq bug exists in current md-next (42b805af102) but > did not trigger in the base tag of v5.18-rc3. Bisecting revealed the bug > was introduced by: > > 4e54a2493e58 ("bfq: Get rid of __bio_blkcg() usage") > > Reverting that commit and the next commit (075a53b7) on top of md-next > was confirmed to fix the bug. > > I've copied Jan, Jens and Paolo who can hopefully help with this. A > cleaned up stack trace follows this email for their benefit. So I've debugged this. The crash happens on the very first bio submitted to the md0 device. The problem is that this bio gets remapped to loop0 - this happens through bio_alloc_clone() -> __bio_clone() which ends up calling bio_clone_blkg_association(). Now the resulting bio is inconsistent - it's dst_bio->bi_bdev is pointing to loop0 while dst_bio->bi_blkg is pointing to blkcg_gq associated with md0 request queue. And this breaks BFQ because when this bio is inserted to loop0 request queue, BFQ looks at bio->bi_blkg->q (it is a bit more complex than that but this is the gist of the problem), expects its data there but BFQ is not initialized for md0 request_queue. Now I think this is a bug in __bio_clone() but the inconsistency in the bio is very much what we asked bio_clone_blkg_association() to do so maybe I'm missing something and bios that are associated with one bdev but pointing to blkg of another bdev are fine and controllers are supposed to handle that (although I'm not sure how should they do that). So I'm asking here before I just go and delete bio_clone_blkg_association() from __bio_clone()... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR