Re: [PATCH 0/4 v5] bfq: Avoid use-after-free when moving processes between cgroups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 08-02-22 21:19:14, yukuai (C) wrote:
> 在 2022/02/03 5:53, Jan Kara 写道:
> > Thanks for debugging! I was looking into this but I also do not understand
> > how what your tracing shows can happen. In particular I don't see why there
> > is no __bfq_bic_change_cgroup() call from bfq_insert_request() ->
> > bfq_init_rq() for the problematic __bfq_insert_request() into
> > ffff888106913700. I have two possible explanations. Either bio is submitted
> > to the offlined cgroup ffff888107d67000 or bic->blkcg_serial_nr is pointing
> > to different cgroup than bic_to_bfqq(bic, 1)->entity.parent.
> > 
> > So can you extented the debugging a bit like:
> > 1) Add current->pid to all debug messages so that we can distinguish
> > different processes and see which already detached from the bfqq and which
> > not.
> > 
> > 2) Print bic->blkcg_serial_nr and __bio_blkcg(bio)->css.serial_nr before
> > crashing in bfq_add_bfqq_busy().
> > 
> > 3) Add BUG_ON to bic_set_bfqq() like:
> > 	if (bfqq_group(bfqq)->css.serial_nr != bic->blkcg_serial_nr) {
> > 		printk("%s: bfqq %px(%px) serial %d bic serial %d\n", bfqq,
> > 			bfqq_group(bfqq), bfqq_group(bfqq)->css.serial_nr,
> > 			bic->blkcg_serial_nr);
> > 		BUG_ON(1);
> > 	}
> > 
> > and perhaps this scheds more light on the problem... Thanks!
> > 
> > 								Honza
> > 
> 
> The debuging patch and log is attached.
> 
> bic_set_bfqq found serial mismatch serval times, bfqq_group(bfqq)->css
> doesn't exist, is the following debugging what you expected?
> 
> @@ -386,6 +386,13 @@ static void bfq_put_stable_ref(struct bfq_queue *bfqq);
> 
>  void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool
> is_sync)
>  {
> +       if (bfqq && bfqq_group(bfqq)->pd.blkg->blkcg->css.serial_nr !=
> +                   bic->blkcg_serial_nr) {
> +               bfqq_dbg(bfqq, "serial %lld bic serial %lld\n",
> +                        bfqq_group(bfqq)->pd.blkg->blkcg->css.serial_nr,
> +                        bic->blkcg_serial_nr);
> +               WARN_ON_ONCE(1);
> +       }

Yes, this is what I wanted. Thanks!

> 
> The problematic bfqq ffff8881682581f0 doesn't merge with any bfqq,
> which is weird.
> 
> The pid from __bfq_insert_request() is changed for the problematic
> bfqq, and each time the pid is changed, there is a bic_set_bfqq()
> shows that serial mismatch.

I had a look into debug data and now I think I understand both the WARN_ON
hit in bic_set_bfqq() as well as the final BUG_ON in bfq_add_bfqq_busy().

The first problem is apparently hit because __bio_blkcg() can change while
we are processing the bio. So bfq_bic_update_cgroup() sees different
__bio_blkcg() than bfq_get_queue() called from bfq_get_bfqq_handle_split().
This then causes mismatch between what bic & bfqq think about cgroup
membership which can lead to interesting inconsistencies down the road.

The second problem is hit because clearly __bio_blkcg() can be pointing to
a blkcg that has been already offlined. Previously I didn't think this was
possible but apparently there is nothing that would prevent this race. So
we need to handle this gracefully inside BFQ.

I need to think what would be best fixes for these issues since especially
the second one is tricky.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux