Re: Sleeping while atomic regression around hd_struct_free() in 5.9-rc

Ilya Dryomov <idryomov@xxxxxxxxx> · Sun, 30 Aug 2020 10:46:27 +0200

On Fri, Aug 28, 2020 at 3:05 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>
> Hi Ilya,
>
> Thanks for your report!
>
> On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote:
> > Hi Ming,
> >
> > There seems to be a sleeping while atomic bug in hd_struct_free():
> >
> > 288 static void hd_struct_free(struct percpu_ref *ref)
> > 289 {
> > 290         struct hd_struct *part = container_of(ref, struct hd_struct, ref);
> > 291         struct gendisk *disk = part_to_disk(part);
> > 292         struct disk_part_tbl *ptbl =
> > 293                 rcu_dereference_protected(disk->part_tbl, 1);
> > 294
> > 295         rcu_assign_pointer(ptbl->last_lookup, NULL);
> > 296         put_device(disk_to_dev(disk));
> > 297
> > 298         INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
> > 299         queue_rcu_work(system_wq, &part->rcu_work);
> > 300 }
> >
> > hd_struct_free() is a percpu_ref release callback and must not sleep.
> > But put_device() can end up in disk_release(), resulting in anything
> > from "sleeping function called from invalid context" splats to actual
> > lockups if the queue ends up being released:
> >
> >   BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
> >   INFO: lockdep is turned off.
> >   CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G        W
> > 5.9.0-rc2-ceph-g2de49bea2ebc #1
> >   Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
> >   Call Trace:
> >    dump_stack+0x96/0xd0
> >    __schedule_bug.cold+0x64/0x71
> >    __schedule+0x8ea/0xac0
> >    ? wait_for_completion+0x86/0x110
> >    schedule+0x5f/0xd0
> >    schedule_timeout+0x212/0x2a0
> >    ? wait_for_completion+0x86/0x110
> >    wait_for_completion+0xb0/0x110
> >    __wait_rcu_gp+0x139/0x150
> >    synchronize_rcu+0x79/0xf0
> >    ? invoke_rcu_core+0xb0/0xb0
> >    ? rcu_read_lock_any_held+0xb0/0xb0
> >    blk_free_flush_queue+0x17/0x30
> >    blk_mq_hw_sysfs_release+0x32/0x70
> >    kobject_put+0x7d/0x1d0
> >    blk_mq_release+0xbe/0xf0
> >    blk_release_queue+0xb7/0x140
> >    kobject_put+0x7d/0x1d0
> >    disk_release+0xb0/0xc0
> >    device_release+0x25/0x80
> >    kobject_put+0x7d/0x1d0
> >    hd_struct_free+0x4c/0xc0
> >    percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
> >    rcu_core+0x3fd/0x660
> >    ? rcu_core+0x3cc/0x660
> >    __do_softirq+0xd5/0x45e
> >    ? smpboot_thread_fn+0x26/0x1d0
> >    run_ksoftirqd+0x30/0x60
> >    smpboot_thread_fn+0xfe/0x1d0
> >    ? sort_range+0x20/0x20
> >    kthread+0x11a/0x150
> >    ? kthread_delayed_work_timer_fn+0xa0/0xa0
> >    ret_from_fork+0x1f/0x30
> >
> > "git blame" points at your commit tb7d6c3033323 ("block: fix
> > use-after-free on cached last_lookup partition"), but there is
> > likely more to it because it went into 5.8 and I haven't seen
> > these lockups until we rebased to 5.9-rc.
>
> The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to
> synchronous request_queue removal").
>
> >
> > Could you please take a look?
>
> Can you try the following patch?
>
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index e62a98a8eeb7..b06fc3425802 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work)
>  {
>         struct hd_struct *part =
>                 container_of(to_rcu_work(work), struct hd_struct, rcu_work);
> +       struct gendisk *disk = part_to_disk(part);
> +
> +       /*
> +        * Release the reference grabbed in delete_partition, and it should
> +        * have been done in hd_struct_free(), however device's release
> +        * handler can't be done in percpu_ref's ->release() callback
> +        * because it is run via call_rcu().
> +        */
> +       put_device(disk_to_dev(disk));
>
>         part->start_sect = 0;
>         part->nr_sects = 0;
> @@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref)
>                 rcu_dereference_protected(disk->part_tbl, 1);
>
>         rcu_assign_pointer(ptbl->last_lookup, NULL);
> -       put_device(disk_to_dev(disk));
>
>         INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
>         queue_rcu_work(system_wq, &part->rcu_work);

This patch fixes it for me.

Thanks,

                Ilya