在 2022/02/10 1:40, Jan Kara 写道:
I had a look into debug data and now I think I understand both the WARN_ON hit in bic_set_bfqq() as well as the final BUG_ON in bfq_add_bfqq_busy(). The first problem is apparently hit because __bio_blkcg() can change while we are processing the bio. So bfq_bic_update_cgroup() sees different __bio_blkcg() than bfq_get_queue() called from bfq_get_bfqq_handle_split(). This then causes mismatch between what bic & bfqq think about cgroup membership which can lead to interesting inconsistencies down the road. The second problem is hit because clearly __bio_blkcg() can be pointing to a blkcg that has been already offlined. Previously I didn't think this was possible but apparently there is nothing that would prevent this race. So we need to handle this gracefully inside BFQ. I need to think what would be best fixes for these issues since especially the second one is tricky.
Hi, Jan Just to be curiosity, do you have any ideas on how to fix these issues? Thanks, Kuai
Honza