On Fri 11-03-22 14:39:10, yukuai (C) wrote: > 在 2022/02/10 1:40, Jan Kara 写道: > > > > I had a look into debug data and now I think I understand both the WARN_ON > > hit in bic_set_bfqq() as well as the final BUG_ON in bfq_add_bfqq_busy(). > > > > The first problem is apparently hit because __bio_blkcg() can change while > > we are processing the bio. So bfq_bic_update_cgroup() sees different > > __bio_blkcg() than bfq_get_queue() called from bfq_get_bfqq_handle_split(). > > This then causes mismatch between what bic & bfqq think about cgroup > > membership which can lead to interesting inconsistencies down the road. > > > > The second problem is hit because clearly __bio_blkcg() can be pointing to > > a blkcg that has been already offlined. Previously I didn't think this was > > possible but apparently there is nothing that would prevent this race. So > > we need to handle this gracefully inside BFQ. > > > > I need to think what would be best fixes for these issues since especially > > the second one is tricky. > > Hi, Jan > > Just to be curiosity, do you have any ideas on how to fix these issues? Sorry for the delay. I was on vacation and then busy with other stuff. I have some version of the fixes ready but I want to clean them up a bit before posting. It shouldn't take me long... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR