Sleeping while atomic regression around hd_struct_free() in 5.9-rc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ming,

There seems to be a sleeping while atomic bug in hd_struct_free():

288 static void hd_struct_free(struct percpu_ref *ref)
289 {
290         struct hd_struct *part = container_of(ref, struct hd_struct, ref);
291         struct gendisk *disk = part_to_disk(part);
292         struct disk_part_tbl *ptbl =
293                 rcu_dereference_protected(disk->part_tbl, 1);
294
295         rcu_assign_pointer(ptbl->last_lookup, NULL);
296         put_device(disk_to_dev(disk));
297
298         INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
299         queue_rcu_work(system_wq, &part->rcu_work);
300 }

hd_struct_free() is a percpu_ref release callback and must not sleep.
But put_device() can end up in disk_release(), resulting in anything
from "sleeping function called from invalid context" splats to actual
lockups if the queue ends up being released:

  BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
  INFO: lockdep is turned off.
  CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G        W
5.9.0-rc2-ceph-g2de49bea2ebc #1
  Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
  Call Trace:
   dump_stack+0x96/0xd0
   __schedule_bug.cold+0x64/0x71
   __schedule+0x8ea/0xac0
   ? wait_for_completion+0x86/0x110
   schedule+0x5f/0xd0
   schedule_timeout+0x212/0x2a0
   ? wait_for_completion+0x86/0x110
   wait_for_completion+0xb0/0x110
   __wait_rcu_gp+0x139/0x150
   synchronize_rcu+0x79/0xf0
   ? invoke_rcu_core+0xb0/0xb0
   ? rcu_read_lock_any_held+0xb0/0xb0
   blk_free_flush_queue+0x17/0x30
   blk_mq_hw_sysfs_release+0x32/0x70
   kobject_put+0x7d/0x1d0
   blk_mq_release+0xbe/0xf0
   blk_release_queue+0xb7/0x140
   kobject_put+0x7d/0x1d0
   disk_release+0xb0/0xc0
   device_release+0x25/0x80
   kobject_put+0x7d/0x1d0
   hd_struct_free+0x4c/0xc0
   percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
   rcu_core+0x3fd/0x660
   ? rcu_core+0x3cc/0x660
   __do_softirq+0xd5/0x45e
   ? smpboot_thread_fn+0x26/0x1d0
   run_ksoftirqd+0x30/0x60
   smpboot_thread_fn+0xfe/0x1d0
   ? sort_range+0x20/0x20
   kthread+0x11a/0x150
   ? kthread_delayed_work_timer_fn+0xa0/0xa0
   ret_from_fork+0x1f/0x30

"git blame" points at your commit tb7d6c3033323 ("block: fix
use-after-free on cached last_lookup partition"), but there is
likely more to it because it went into 5.8 and I haven't seen
these lockups until we rebased to 5.9-rc.

Could you please take a look?

Thanks,

                Ilya



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux