在 2022/01/24 22:02, Jan Kara 写道:
On Fri 21-01-22 19:42:11, yukuai (C) wrote:
在 2022/01/21 18:56, Jan Kara 写道:
Hello,
here is the fifth version of my patches to fix use-after-free issues in BFQ
when processes with merged queues get moved to different cgroups. The patches
have survived some beating in my test VM, but so far I fail to reproduce the
original KASAN reports so testing from people who can reproduce them is most
welcome. Kuai, can you please give these patches a run in your setup? Thanks
a lot for your help with fixing this!
Changes since v4:
* Even more aggressive splitting of merged bfq queues to avoid problems with
long merge chains.
Changes since v3:
* Changed handling of bfq group move to handle the case when target of the
merge has moved.
Changes since v2:
* Improved handling of bfq queue splitting on move between cgroups
* Removed broken change to bfq_put_cooperator()
Changes since v1:
* Added fix for bfq_put_cooperator()
* Added fix to handle move between cgroups in bfq_merge_bio()
Honza
Previous versions:
Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@xxxxxxx # v1
Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@xxxxxxx # v2
Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@xxxxxxx # v3
Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@xxxxxxx # v4
.
Hi, Jan
I add a new BUG_ON() in bfq_setup_merge() while iterating new_bfqq, and
this time this BUG_ON() is triggered:
Thanks for testing!
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 07be51bc229b..6d4e243c9a1e 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2753,6 +2753,14 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct
bfq_queue *new_bfqq)
while ((__bfqq = new_bfqq->new_bfqq)) {
if (__bfqq == bfqq)
return NULL;
+ if (new_bfqq->entity.parent != __bfqq->entity.parent &&
+ bfqq_group(__bfqq) != __bfqq->bfqd->root_group) {
+ printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n",
__func__,
+ new_bfqq, bfqq_group(new_bfqq), __bfqq,
+ bfqq_group(__bfqq));
+ BUG_ON(1);
This seems to be too early to check and BUG_ON(). Yes, we can walk through
and even end up with a bfqq with a different parent however in that case we
refuse to setup merge a few lines below and so there is no problem.
Are you still able to reproduce the use-after-free issue with this version
of my patches?
Honza
Hi, Jan
I add following additional debug info:
@ -926,6 +935,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
if (!entity) /* root group */
goto put_async_queues;
+ printk("%s: bfqg %px offlined\n", __func__, bfqg);
/*
* Empty all service_trees belonging to this group before
* deactivating the group itself.
@@ -965,6 +975,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
put_async_queues:
bfq_put_async_queues(bfqd, bfqg);
+ pd->plid = BLKCG_MAX_POLS;
spin_unlock_irqrestore(&bfqd->lock, flags);
/*
@@ -6039,6 +6050,13 @@ static bool __bfq_insert_request(struct bfq_data
*bfqd, struct request *rq)
*new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true,
RQ_BIC(rq));
bool waiting, idle_timer_disabled = false;
+ if (new_bfqq) {
+ printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", __func__,
+ bfqq, bfqq_group(bfqq), new_bfqq,
bfqq_group(new_bfqq));
+ } else {
+ printk("%s: bfqq %px(%px) new_bfqq null \n", __func__,
+ bfqq, bfqq_group(bfqq));
+ }
@@ -1696,6 +1696,11 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct
bfq_queue *bfqq)
bfq_activate_bfqq(bfqd, bfqq);
+ if (bfqq->entity.parent && bfqq_group(bfqq)->pd.plid >=
BLKCG_MAX_POLS) {
+ printk("%s: bfqq %px(%px) with parent offlined\n", __func__,
+ bfqq, bfqq_group(bfqq));
+ BUG_ON(1);
+ }
And found that the uaf is triggered when bfq_setup_cooperator return
NULL, that's why the BUG_ON() in bfq_setup_cooperator() is not
triggered:
[ 51.833290] __bfq_insert_request: bfqq ffff888106913700(ffff888107d67000)
new_bfqq null
[ 51.834762] bfq_add_bfqq_busy: bfqq ffff888106913700(ffff888107d67000)
with parent offlined
The new_bfqq chain relate to bfqq ffff888106913700:
t1: ffff8881114e9600 ------> t4: ffff888106913700 ------> t5:
ffff88810719e3c0
|
t2: ffff888106913440 ----
|
t3: ffff8881114e98c0 ----
I'm still not sure about the root cause, hope these debuginfo
can be helpful