On Fri 07-01-22 17:15:43, yukuai (C) wrote: > 在 2022/01/05 22:36, Jan Kara 写道: > > Hello, > > > > here is the second version of my patches to fix use-after-free issues in BFQ > > when processes with merged queues get moved to different cgroups. The patches > > have survived some beating in my test VM but so far I fail to reproduce the > > original KASAN reports so testing from people who can reproduce them is most > > welcome. Thanks! > > > > Changes since v1: > > * Added fix for bfq_put_cooperator() > > * Added fix to handle move between cgroups in bfq_merge_bio() > > > > Honza > > Previous versions: > > Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@xxxxxxx # v1 > > . > > > > Hi, > > I repoduced the problem again with this patchset... Thanks for testing! > [ 71.004788] BUG: KASAN: use-after-free in > __bfq_deactivate_entity+0x21/0x290 > [ 71.006328] Read of size 1 at addr ffff88817a3dc0b0 by task rmmod/801 > [ 71.007723] > [ 71.008068] CPU: 7 PID: 801 Comm: rmmod Tainted: G W > 5.16.0-rc5-next-2021127 > [ 71.009995] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > ?-20190727_073836-4 > [ 71.012274] Call Trace: > [ 71.012603] <TASK> > [ 71.012886] dump_stack_lvl+0x34/0x44 > [ 71.013379] print_address_description.constprop.0.cold+0xab/0x36b > [ 71.014182] ? __bfq_deactivate_entity+0x21/0x290 > [ 71.014795] ? __bfq_deactivate_entity+0x21/0x290 > [ 71.015398] kasan_report.cold+0x83/0xdf > [ 71.015904] ? _raw_read_lock_bh+0x20/0x40 > [ 71.016433] ? __bfq_deactivate_entity+0x21/0x290 > [ 71.017033] __bfq_deactivate_entity+0x21/0x290 > [ 71.017617] bfq_pd_offline+0xc1/0x110 > [ 71.018105] blkcg_deactivate_policy+0x14b/0x210 ... > Here is the caller of __bfq_deactivate_entity: > (gdb) list *(bfq_pd_offline+0xc1) > 0xffffffff81c504f1 is in bfq_pd_offline (block/bfq-cgroup.c:942). > 937 * entities to the idle tree. It happens if, in some > 938 * of the calls to bfq_bfqq_move() performed by > 939 * bfq_reparent_active_queues(), the queue to move > is > 940 * empty and gets expired. > 941 */ > 942 bfq_flush_idle_tree(st); > 943 } > 944 > 945 __bfq_deactivate_entity(entity, false); So this is indeed strange. The group has in some of its idle service trees an entity whose entity->sched_data points to already freed cgroup. In particular it means the bfqq_entity_service_tree() leads to somewhere else than the 'st' we called bfq_flush_idle_tree() with. This looks like a different kind of problem AFAICT but still it needs fixing :). Can you please run your reproducer with my patches + the attached debug patch on top? Hopefully it should tell us more about the problematic entity and how it got there... Thanks! Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR