Hi Ilya, Thanks for your report! On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote: > Hi Ming, > > There seems to be a sleeping while atomic bug in hd_struct_free(): > > 288 static void hd_struct_free(struct percpu_ref *ref) > 289 { > 290 struct hd_struct *part = container_of(ref, struct hd_struct, ref); > 291 struct gendisk *disk = part_to_disk(part); > 292 struct disk_part_tbl *ptbl = > 293 rcu_dereference_protected(disk->part_tbl, 1); > 294 > 295 rcu_assign_pointer(ptbl->last_lookup, NULL); > 296 put_device(disk_to_dev(disk)); > 297 > 298 INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work); > 299 queue_rcu_work(system_wq, &part->rcu_work); > 300 } > > hd_struct_free() is a percpu_ref release callback and must not sleep. > But put_device() can end up in disk_release(), resulting in anything > from "sleeping function called from invalid context" splats to actual > lockups if the queue ends up being released: > > BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102 > INFO: lockdep is turned off. > CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G W > 5.9.0-rc2-ceph-g2de49bea2ebc #1 > Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 > Call Trace: > dump_stack+0x96/0xd0 > __schedule_bug.cold+0x64/0x71 > __schedule+0x8ea/0xac0 > ? wait_for_completion+0x86/0x110 > schedule+0x5f/0xd0 > schedule_timeout+0x212/0x2a0 > ? wait_for_completion+0x86/0x110 > wait_for_completion+0xb0/0x110 > __wait_rcu_gp+0x139/0x150 > synchronize_rcu+0x79/0xf0 > ? invoke_rcu_core+0xb0/0xb0 > ? rcu_read_lock_any_held+0xb0/0xb0 > blk_free_flush_queue+0x17/0x30 > blk_mq_hw_sysfs_release+0x32/0x70 > kobject_put+0x7d/0x1d0 > blk_mq_release+0xbe/0xf0 > blk_release_queue+0xb7/0x140 > kobject_put+0x7d/0x1d0 > disk_release+0xb0/0xc0 > device_release+0x25/0x80 > kobject_put+0x7d/0x1d0 > hd_struct_free+0x4c/0xc0 > percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0 > rcu_core+0x3fd/0x660 > ? rcu_core+0x3cc/0x660 > __do_softirq+0xd5/0x45e > ? smpboot_thread_fn+0x26/0x1d0 > run_ksoftirqd+0x30/0x60 > smpboot_thread_fn+0xfe/0x1d0 > ? sort_range+0x20/0x20 > kthread+0x11a/0x150 > ? kthread_delayed_work_timer_fn+0xa0/0xa0 > ret_from_fork+0x1f/0x30 > > "git blame" points at your commit tb7d6c3033323 ("block: fix > use-after-free on cached last_lookup partition"), but there is > likely more to it because it went into 5.8 and I haven't seen > these lockups until we rebased to 5.9-rc. The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to synchronous request_queue removal"). > > Could you please take a look? Can you try the following patch? diff --git a/block/partitions/core.c b/block/partitions/core.c index e62a98a8eeb7..b06fc3425802 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work) { struct hd_struct *part = container_of(to_rcu_work(work), struct hd_struct, rcu_work); + struct gendisk *disk = part_to_disk(part); + + /* + * Release the reference grabbed in delete_partition, and it should + * have been done in hd_struct_free(), however device's release + * handler can't be done in percpu_ref's ->release() callback + * because it is run via call_rcu(). + */ + put_device(disk_to_dev(disk)); part->start_sect = 0; part->nr_sects = 0; @@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref) rcu_dereference_protected(disk->part_tbl, 1); rcu_assign_pointer(ptbl->last_lookup, NULL); - put_device(disk_to_dev(disk)); INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work); queue_rcu_work(system_wq, &part->rcu_work); Thanks, Ming