On Thu, May 26, 2022 at 01:53:36PM +0200, Jan Kara wrote: > So I've debugged this. The crash happens on the very first bio submitted to > the md0 device. The problem is that this bio gets remapped to loop0 - this > happens through bio_alloc_clone() -> __bio_clone() which ends up calling > bio_clone_blkg_association(). Now the resulting bio is inconsistent - it's > dst_bio->bi_bdev is pointing to loop0 while dst_bio->bi_blkg is pointing to > blkcg_gq associated with md0 request queue. And this breaks BFQ because > when this bio is inserted to loop0 request queue, BFQ looks at > bio->bi_blkg->q (it is a bit more complex than that but this is the gist > of the problem), expects its data there but BFQ is not initialized for md0 > request_queue. > > Now I think this is a bug in __bio_clone() but the inconsistency in the bio > is very much what we asked bio_clone_blkg_association() to do so maybe I'm > missing something and bios that are associated with one bdev but pointing > to blkg of another bdev are fine and controllers are supposed to handle > that (although I'm not sure how should they do that). So I'm asking here > before I just go and delete bio_clone_blkg_association() from > __bio_clone()... This behavior probably goes back to my commit here: ommit d92c370a16cbe0276954c761b874bd024a7e4fac Author: Christoph Hellwig <hch@xxxxxx> Date: Sat Jun 27 09:31:48 2020 +0200 block: really clone the block cgroup in bio_clone_blkg_association and it seems everyone else was fine with that behavior so far.