On Fri, Aug 28, 2020 at 3:05 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > Hi Ilya, > > Thanks for your report! > > On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote: > > Hi Ming, > > > > There seems to be a sleeping while atomic bug in hd_struct_free(): > > > > 288 static void hd_struct_free(struct percpu_ref *ref) > > 289 { > > 290 struct hd_struct *part = container_of(ref, struct hd_struct, ref); > > 291 struct gendisk *disk = part_to_disk(part); > > 292 struct disk_part_tbl *ptbl = > > 293 rcu_dereference_protected(disk->part_tbl, 1); > > 294 > > 295 rcu_assign_pointer(ptbl->last_lookup, NULL); > > 296 put_device(disk_to_dev(disk)); > > 297 > > 298 INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work); > > 299 queue_rcu_work(system_wq, &part->rcu_work); > > 300 } > > > > hd_struct_free() is a percpu_ref release callback and must not sleep. > > But put_device() can end up in disk_release(), resulting in anything > > from "sleeping function called from invalid context" splats to actual > > lockups if the queue ends up being released: > > > > BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102 > > INFO: lockdep is turned off. > > CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G W > > 5.9.0-rc2-ceph-g2de49bea2ebc #1 > > Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 > > Call Trace: > > dump_stack+0x96/0xd0 > > __schedule_bug.cold+0x64/0x71 > > __schedule+0x8ea/0xac0 > > ? wait_for_completion+0x86/0x110 > > schedule+0x5f/0xd0 > > schedule_timeout+0x212/0x2a0 > > ? wait_for_completion+0x86/0x110 > > wait_for_completion+0xb0/0x110 > > __wait_rcu_gp+0x139/0x150 > > synchronize_rcu+0x79/0xf0 > > ? invoke_rcu_core+0xb0/0xb0 > > ? rcu_read_lock_any_held+0xb0/0xb0 > > blk_free_flush_queue+0x17/0x30 > > blk_mq_hw_sysfs_release+0x32/0x70 > > kobject_put+0x7d/0x1d0 > > blk_mq_release+0xbe/0xf0 > > blk_release_queue+0xb7/0x140 > > kobject_put+0x7d/0x1d0 > > disk_release+0xb0/0xc0 > > device_release+0x25/0x80 > > kobject_put+0x7d/0x1d0 > > hd_struct_free+0x4c/0xc0 > > percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0 > > rcu_core+0x3fd/0x660 > > ? rcu_core+0x3cc/0x660 > > __do_softirq+0xd5/0x45e > > ? smpboot_thread_fn+0x26/0x1d0 > > run_ksoftirqd+0x30/0x60 > > smpboot_thread_fn+0xfe/0x1d0 > > ? sort_range+0x20/0x20 > > kthread+0x11a/0x150 > > ? kthread_delayed_work_timer_fn+0xa0/0xa0 > > ret_from_fork+0x1f/0x30 > > > > "git blame" points at your commit tb7d6c3033323 ("block: fix > > use-after-free on cached last_lookup partition"), but there is > > likely more to it because it went into 5.8 and I haven't seen > > these lockups until we rebased to 5.9-rc. > > The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to > synchronous request_queue removal"). > > > > > Could you please take a look? > > Can you try the following patch? > > diff --git a/block/partitions/core.c b/block/partitions/core.c > index e62a98a8eeb7..b06fc3425802 100644 > --- a/block/partitions/core.c > +++ b/block/partitions/core.c > @@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work) > { > struct hd_struct *part = > container_of(to_rcu_work(work), struct hd_struct, rcu_work); > + struct gendisk *disk = part_to_disk(part); > + > + /* > + * Release the reference grabbed in delete_partition, and it should > + * have been done in hd_struct_free(), however device's release > + * handler can't be done in percpu_ref's ->release() callback > + * because it is run via call_rcu(). > + */ > + put_device(disk_to_dev(disk)); > > part->start_sect = 0; > part->nr_sects = 0; > @@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref) > rcu_dereference_protected(disk->part_tbl, 1); > > rcu_assign_pointer(ptbl->last_lookup, NULL); > - put_device(disk_to_dev(disk)); > > INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work); > queue_rcu_work(system_wq, &part->rcu_work); This patch fixes it for me. Thanks, Ilya