On Mon 17-01-22 16:13:09, yukuai (C) wrote: > 在 2022/01/15 1:01, Jan Kara 写道: > > Hello, > > > > here is the third version of my patches to fix use-after-free issues in BFQ > > when processes with merged queues get moved to different cgroups. The patches > > have survived some beating in my test VM, but so far I fail to reproduce the > > original KASAN reports so testing from people who can reproduce them is most > > welcome. Kuai, can you please give these patches a run in your setup? Thanks > > a lot for your help with fixing this! > > > > Changes since v3: > > * Changed handling of bfq group move to handle the case when target of the > > merge has moved. > Hi, Jan > > The problem can still be reporduced... Drat. Thanks for testing. > Do you implement this in patch 4? If so, I don't understand how you > chieve this. Yes, patch 4 should be handling this. > For example: 3 bfqqs were merged: > q1->q2->q3 > > If __bfq_bic_change_cgroup() is called with q2, the problem can be > fixed. However, if __bfq_bic_change_cgroup() is called with q3, can > the problem be fixed? Maybe I'm missing something so let's walk through your example where the bio is submitted for a task attached to q3. Bio is submitted, __bfq_bic_change_cgroup() is called with sync_bfqq == q3, the loop while (sync_bfqq->new_bfqq) sync_bfqq = sync_bfqq->new_bfqq; won't do anything. Then we check: if (sync_bfqq->entity.sched_data != &bfqg->sched_data) { This should be true because the task got moved and the bio is thus now submitted for a different cgroup. Then we get to the condition: if (orig_bfqq != sync_bfqq || bfq_bfqq_coop(sync_bfqq)) orig_bfqq == sync_bfqq so that won't help but the idea was that bfq_bfqq_coop() would trigger in this case so we detach the process from q3 (through bic_set_bfqq(bic, NULL, 1)) and create new queue in the right cgroup. Eventually, all the processes would detach from q3 and it would get destroyed. So why do you think this scheme will not work? And even if q3 is not marked as coop (which should not happen when there are other queues merged to it), we would move q3 to a cgroup for which IO is submitted - i.e., a one which is alive. Hum, but maybe with the above merge setup (q1->q2->q3) if __bfq_bic_change_cgroup() gets called for q1, q3 is already moved to the root cgroup by bfq_pd_offline() and bio is for the root cgroup, then we decide to keep everything as is. But bfq_setup_cooperator() will return q2 and we queue IO there and q2 may still be pointing to a dead cgroup (but q2 didn't get reparented because it was not in any of the service trees). So perhaps attached fixup will help? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index f6f5f156b9f2..3e8f0564ff0b 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -732,7 +732,7 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd, struct bfq_queue *orig_bfqq = sync_bfqq; /* Traverse the merge chain to bfqq we will be using */ - while (sync_bfqq->new_bfqq) + if (sync_bfqq->new_bfqq) sync_bfqq = sync_bfqq->new_bfqq; /* * Target bfqq got moved to a different cgroup or this process