On Tue 18-01-22 11:00:43, yukuai (C) wrote: > 在 2022/01/17 23:46, Jan Kara 写道: > > On Mon 17-01-22 22:16:23, yukuai (C) wrote: > > > 在 2022/01/17 19:45, Jan Kara 写道: > > > > On Mon 17-01-22 16:13:09, yukuai (C) wrote: > > > > > 在 2022/01/15 1:01, Jan Kara 写道: > > > > > > Hello, > > > > > > > > > > > > here is the third version of my patches to fix use-after-free issues in BFQ > > > > > > when processes with merged queues get moved to different cgroups. The patches > > > > > > have survived some beating in my test VM, but so far I fail to reproduce the > > > > > > original KASAN reports so testing from people who can reproduce them is most > > > > > > welcome. Kuai, can you please give these patches a run in your setup? Thanks > > > > > > a lot for your help with fixing this! > > > > > > > > > > > > Changes since v3: > > > > > > * Changed handling of bfq group move to handle the case when target of the > > > > > > merge has moved. > > > > > Hi, Jan > > > > > > > > > > The problem can still be reporduced... > > > > > > > > Drat. Thanks for testing. > > > > > > > > > Do you implement this in patch 4? If so, I don't understand how you > > > > > chieve this. > > > > > > > > Yes, patch 4 should be handling this. > > > > > > > > > For example: 3 bfqqs were merged: > > > > > q1->q2->q3 > > > > > > > > > > If __bfq_bic_change_cgroup() is called with q2, the problem can be > > > > > fixed. However, if __bfq_bic_change_cgroup() is called with q3, can > > > > > the problem be fixed? > > > > > > > > Maybe I'm missing something so let's walk through your example where the > > > > bio is submitted for a task attached to q3. Bio is submitted, > > > > __bfq_bic_change_cgroup() is called with sync_bfqq == q3, the loop > > > > > > > > while (sync_bfqq->new_bfqq) > > > > sync_bfqq = sync_bfqq->new_bfqq; > > > > > > > > won't do anything. Then we check: > > > > > > > > if (sync_bfqq->entity.sched_data != &bfqg->sched_data) { > > > > > > > > This should be true because the task got moved and the bio is thus now > > > > submitted for a different cgroup. Then we get to the condition: > > > > > > > > if (orig_bfqq != sync_bfqq || bfq_bfqq_coop(sync_bfqq)) > > > > > > > > orig_bfqq == sync_bfqq so that won't help but the idea was that > > > > bfq_bfqq_coop() would trigger in this case so we detach the process from q3 > > > > (through bic_set_bfqq(bic, NULL, 1)) and create new queue in the right > > > > cgroup. Eventually, all the processes would detach from q3 and it would get > > > > destroyed. So why do you think this scheme will not work? > > > > > > Hi, Jan > > > > > > I repoduced again with some debug info: > > > > Thanks for your help with debugging! > > > > > diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c > > > index f6f5f156b9f2..f62ebc4bbe56 100644 > > > --- a/block/bfq-cgroup.c > > > +++ b/block/bfq-cgroup.c > > > @@ -732,8 +732,11 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct > > > bfq_data *bfqd, > > > struct bfq_queue *orig_bfqq = sync_bfqq; > > > > > > /* Traverse the merge chain to bfqq we will be using */ > > > - while (sync_bfqq->new_bfqq) > > > + while (sync_bfqq->new_bfqq) { > > > + printk("%s: bfqq %px, new_bfqq %px\n", __func__, > > > + sync_bfqq, sync_bfqq->new_bfqq); > > > sync_bfqq = sync_bfqq->new_bfqq; > > > + } > > > /* > > > * Target bfqq got moved to a different cgroup or this > > > process > > > * started submitting bios for different cgroup? > > > @@ -756,6 +759,8 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct > > > bfq_data *bfqd, > > > bic_set_bfqq(bic, NULL, 1); > > > > Maybe it would be nice to add a debug message here, saying we are detaching > > the process from orig_bfqq. Just that we know that this branch executed. > > > > > return bfqg; > > > } > > > + printk("%s: bfqq %px move from %px to %px\n", > > > __func__, > > > + sync_bfqq, bfqq_group(sync_bfqq), bfqg); > > > /* We are the only user of this bfqq, just move it > > > */ > > > bfq_bfqq_move(bfqd, sync_bfqq, bfqg); > > > > ... > > > > > - if (bfqq->new_bfqq) > > > + if (bfqq->new_bfqq) { > > > + if (bfqq->entity.parent != bfqq->new_bfqq->entity.parent) { > > > + printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", > > > __func__, > > > + bfqq, bfqq_group(bfqq), bfqq->new_bfqq, > > > + bfqq_group(bfqq->new_bfqq)); > > > + BUG_ON(1); > > > > OK, so I can see why this triggered. We have: > > > > Set new_bfqq for bfqq *eec0: > > [ 50.207263] bfq_setup_merge: set bfqq ffff88816b16eec0 new_bfqq ffff8881133e1340 parent ffff88814380b000/ffff88814380b000 > > Move new_bfqq to a root cgroup: > > [ 50.485933] bfq_reparent_leaf_entity: bfqq ffff8881133e1340 move from ffff88814380b000 to ffff88810b1f1000 > > Submit bio for root cgroup through *eec0: > > [ 50.519910] __bfq_bic_change_cgroup: bfqq ffff88816b16eec0, new_bfqq ffff8881133e1340 > > __bfq_bic_change_cgroup() does nothing as the bio is for the cgroup to which > > target queue belongs. > > [ 50.520640] bfq_setup_cooperator: bfqq ffff88816b16eec0(ffff88814380b000) new_bfqq ffff8881133e1340(ffff88810b1f1000) > > The BUG_ON trips. > > > > So at this moment the BUG_ON was just too eager to trigger as we would be > > submitting IO to a bfq queue belonging to an appropriate cgroup. But I > > guess it does not make sence to keep unfinished merges over cgroup > > migration and __bfq_bic_change_cgroup() should have detached task from its > > bfqq anyway. Can you please try running with a new version of patch 4 which > > is attached? And perhaps with your debug patch as well... Thanks! > > Hi > > I reporduced again with the following: > > diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c > index f6f5f156b9f2..1afb9127ca84 100644 > --- a/block/bfq-cgroup.c > +++ b/block/bfq-cgroup.c > @@ -732,8 +732,11 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct > bfq_data *bfqd, > struct bfq_queue *orig_bfqq = sync_bfqq; > > /* Traverse the merge chain to bfqq we will be using */ > - while (sync_bfqq->new_bfqq) > + if (sync_bfqq->new_bfqq) { > + printk("%s: bfqq %px, new_bfqq %px\n", __func__, > + sync_bfqq, sync_bfqq->new_bfqq); > sync_bfqq = sync_bfqq->new_bfqq; > + } > /* > * Target bfqq got moved to a different cgroup or this > process > * started submitting bios for different cgroup? > @@ -754,8 +757,11 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct > bfq_data *bfqd, > bfq_put_cooperator(orig_bfqq); > bfq_release_process_ref(bfqd, orig_bfqq); > bic_set_bfqq(bic, NULL, 1); > + printk("%s: bfqq %px detached\n", __func__, > orig_bfqq); > return bfqg; > } > + printk("%s: bfqq %px move from %px to %px\n", > __func__, > + sync_bfqq, bfqq_group(sync_bfqq), bfqg); > /* We are the only user of this bfqq, just move it > */ > bfq_bfqq_move(bfqd, sync_bfqq, bfqg); > } > @@ -875,7 +881,10 @@ static void bfq_reparent_leaf_entity(struct bfq_data > *bfqd, > } > > bfqq = bfq_entity_to_bfqq(child_entity); > + printk("%s: bfqq %px move from %px to %px\n", __func__, > + bfqq, bfqq_group(bfqq), bfqd->root_group); > bfq_bfqq_move(bfqd, bfqq, bfqd->root_group); > + > } > > /** > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > index 07be51bc229b..c6b439df28ad 100644 > --- a/block/bfq-iosched.c > +++ b/block/bfq-iosched.c > @@ -2797,6 +2797,8 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct > bfq_queue *new_bfqq) > * are likely to increase the throughput. > */ > bfqq->new_bfqq = new_bfqq; > + printk("%s: set bfqq %px new_bfqq %px parent %px/%px \n", __func__, > bfqq, > + new_bfqq, bfqq_group(bfqq), bfqq_group(new_bfqq)); > new_bfqq->ref += process_refs; > return new_bfqq; > } > @@ -2963,8 +2965,16 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct > bfq_queue *bfqq, > if (bfq_too_late_for_merging(bfqq)) > return NULL; > > - if (bfqq->new_bfqq) > + if (bfqq->new_bfqq) { > + if (bfqq->entity.parent != bfqq->new_bfqq->entity.parent && > + bfqq_group(bfqq->new_bfqq) != bfqd->root_group) { > + printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", > __func__, > + bfqq, bfqq_group(bfqq), bfqq->new_bfqq, > + bfqq_group(bfqq->new_bfqq)); > + BUG_ON(1); > + } > return bfqq->new_bfqq; > + } > > if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq)) > return NULL; > @@ -6928,6 +6938,8 @@ static void __bfq_put_async_bfqq(struct bfq_data > *bfqd, > > bfq_log(bfqd, "put_async_bfqq: %p", bfqq); > if (bfqq) { > + printk("%s: bfqq %px move from %px to %px\n", __func__, > + bfqq, bfqq_group(bfqq), bfqd->root_group); > bfq_bfqq_move(bfqd, bfqq, bfqd->root_group); > > bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d", This was not exactly what I wanted but now I've realized I've sent you a wrong patch. My fault and thanks for testing! > The result is attached, one thing that is weird: the uaf is triggered > before the BUG_ON. Yes, that is interesting. The BUG_ON at the end indeed shows a case where there was a longer chain of merges bfq_pd_offline() moved some of the queues from the merge chain to the root cgroup but not all. The state was ffff888169d57440 -> ffff88810b593180 -> ffff88812cfc9340 (root) (root) (orig cgroup) Then something strange happened: [ 92.692545] __bfq_bic_change_cgroup: bfqq ffff888169d57440, new_bfqq ffff88810b593180 [ 92.693374] bfq_setup_cooperator: bfqq ffff88810b593180(ffff88816d167000) new_bfqq ffff88812cfc9340(ffff88817ab36000) We can see __bfq_bic_change_cgroup() got called for queue ffff888169d57440 but then not for ffff88810b593180 while bfq_setup_cooperator() got called for ffff88810b593180. Not sure what happened inside the BFQ. Anyway, the presence of these merge chains and the fact that bfq_pd_offline() can move arbitrary subset to different cgroup shows that we should be more careful. I've changed __bfq_bic_change_cgroup() to check the whole merge chain and detach from bfqq if anything does not match. I'll send it as a new revision. Can you please test it? Thanks! Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR