Re: [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Mon, 30 May 2022 23:11:47 -0700

On Thu, May 26, 2022 at 01:53:36PM +0200, Jan Kara wrote:
> So I've debugged this. The crash happens on the very first bio submitted to
> the md0 device. The problem is that this bio gets remapped to loop0 - this
> happens through bio_alloc_clone() -> __bio_clone() which ends up calling
> bio_clone_blkg_association(). Now the resulting bio is inconsistent - it's
> dst_bio->bi_bdev is pointing to loop0 while dst_bio->bi_blkg is pointing to
> blkcg_gq associated with md0 request queue. And this breaks BFQ because
> when this bio is inserted to loop0 request queue, BFQ looks at
> bio->bi_blkg->q (it is a bit more complex than that but this is the gist
> of the problem), expects its data there but BFQ is not initialized for md0
> request_queue.
> 
> Now I think this is a bug in __bio_clone() but the inconsistency in the bio
> is very much what we asked bio_clone_blkg_association() to do so maybe I'm
> missing something and bios that are associated with one bdev but pointing
> to blkg of another bdev are fine and controllers are supposed to handle
> that (although I'm not sure how should they do that). So I'm asking here
> before I just go and delete bio_clone_blkg_association() from
> __bio_clone()...

This behavior probably goes back to my commit here:

ommit d92c370a16cbe0276954c761b874bd024a7e4fac
Author: Christoph Hellwig <hch@xxxxxx>
Date:   Sat Jun 27 09:31:48 2020 +0200

    block: really clone the block cgroup in bio_clone_blkg_association

and it seems everyone else was fine with that behavior so far.