Deadlock in md

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have not rebooted my system since recovering my data off of an old
raid5 array with an external journal which broke after a crash in
write-back mode (https://www.spinics.net/lists/raid/msg61331.html) and I
noticed this in my kernel log.  I had the array assembled in read-only
mode without a journal.  When I was done getting my data, I had stopped
the array and zeroed the first 1GiB of each raid member.  After this I
modified the partition tables of the disks, but fdisk reported my
/dev/sdb to be in use by the kernel, as did partprobe, presumably due to
a reference leaked from the deadlock.

This is v4.15.67 with an assortment of debugging features enabled, but I
didn't have kmemleak on because I was copying a lot of files (too many
"soft lockups" when it ran).

[203283.385588] md0: detected capacity change from 8796094070784 to 0
[203283.385610] md: md0 stopped.

[203283.430869] ======================================================
[203283.430870] WARNING: possible circular locking dependency detected
[203283.430872] 4.14.67-gentoo-debug #3 Not tainted
[203283.430873] ------------------------------------------------------
[203283.430875] kworker/13:1/480 is trying to acquire lock:
[203283.430876]  (kn->count#82){++++}, at: [<ffffffffb75c8a3a>]
kernfs_remove+0x1a/0x30
[203283.430883]
                but task is already holding lock:
[203283.430885]  ((&mddev->del_work)){+.+.}, at: [<ffffffffb72b7125>]
process_one_work+0x255/0x740
[203283.430889]
                which lock already depends on the new lock.

[203283.430891]
                the existing dependency chain (in reverse order) is:
[203283.430892]
                -> #5 ((&mddev->del_work)){+.+.}:
[203283.430896]        lock_acquire+0xa6/0x160
[203283.430898]        process_one_work+0x2af/0x740
[203283.430900]        worker_thread+0x43/0x690
[203283.430902]        kthread+0x18f/0x260
[203283.430905]        ret_from_fork+0x27/0x50
[203283.430906]
                -> #4 ("md_misc"){+.+.}:
[203283.430910]        lock_acquire+0xa6/0x160
[203283.430911]        flush_workqueue+0xa6/0xa00
[203283.430914]        __md_stop_writes+0x19/0x130
[203283.430916]        do_md_stop+0x18d/0x700
[203283.430917]        md_ioctl+0xcfb/0x2290
[203283.430920]        blkdev_ioctl+0x6a8/0x1020
[203283.430922]        block_ioctl+0x52/0xa0
[203283.430924]        do_vfs_ioctl+0xc4/0x990
[203283.430926]        SyS_ioctl+0x91/0xa0
[203283.430928]        do_syscall_64+0x83/0x260
[203283.430930]        entry_SYSCALL_64_after_hwframe+0x42/0xb7
[203283.430931]
                -> #3 (&mddev->open_mutex){+.+.}:
[203283.430934]        lock_acquire+0xa6/0x160
[203283.430936]        __mutex_lock+0x74/0xc60
[203283.430938]        md_open+0x6c/0x140
[203283.430939]        __blkdev_get+0xce/0x560
[203283.430941]        blkdev_get+0x27f/0x500
[203283.430943]        do_dentry_open+0x293/0x4c0
[203283.430945]        path_openat+0x5a3/0xf30
[203283.430947]        do_filp_open+0xb7/0x150
[203283.430948]        do_sys_open+0x1d1/0x2c0
[203283.430950]        do_syscall_64+0x83/0x260
[203283.430952]        entry_SYSCALL_64_after_hwframe+0x42/0xb7
[203283.430953]
                -> #2 (&bdev->bd_mutex){+.+.}:
[203283.430956]        lock_acquire+0xa6/0x160
[203283.430957]        __mutex_lock+0x74/0xc60
[203283.430959]        __blkdev_get+0x54/0x560
[203283.430960]        blkdev_get+0x27f/0x500
[203283.430962]        blkdev_get_by_dev+0x38/0x60
[203283.430963]        lock_rdev+0x43/0xb0
[203283.430965]        md_import_device+0xf7/0x290
[203283.430966]        add_new_disk+0xe1/0x782
[203283.430968]        md_ioctl+0x1664/0x2290
[203283.430969]        blkdev_ioctl+0x6a8/0x1020
[203283.430971]        block_ioctl+0x52/0xa0
[203283.430972]        do_vfs_ioctl+0xc4/0x990
[203283.430974]        SyS_ioctl+0x91/0xa0
[203283.430976]        do_syscall_64+0x83/0x260
[203283.430978]        entry_SYSCALL_64_after_hwframe+0x42/0xb7
[203283.430978]
                -> #1 (&mddev->reconfig_mutex){+.+.}:
[203283.430982]        lock_acquire+0xa6/0x160
[203283.430983]        __mutex_lock+0x74/0xc60
[203283.430985]        array_state_store+0xcf/0x4e0
[203283.430986]        md_attr_store+0xa5/0x130
[203283.430988]        kernfs_fop_write+0x1ad/0x300
[203283.430990]        __vfs_write+0x64/0x2c0
[203283.430991]        vfs_write+0xf1/0x2e0
[203283.430993]        SyS_write+0x4e/0xc0
[203283.430994]        do_syscall_64+0x83/0x260
[203283.430996]        entry_SYSCALL_64_after_hwframe+0x42/0xb7
[203283.430997]
                -> #0 (kn->count#82){++++}:
[203283.431000]        __lock_acquire+0x14c7/0x1c50
[203283.431002]        lock_acquire+0xa6/0x160
[203283.431004]        __kernfs_remove+0x26d/0x390
[203283.431005]        kernfs_remove+0x1a/0x30
[203283.431008]        kobject_del.part.0+0x21/0x70
[203283.431010]        mddev_delayed_delete+0x1f/0x30
[203283.431012]        process_one_work+0x2bc/0x740
[203283.431013]        worker_thread+0x43/0x690
[203283.431015]        kthread+0x18f/0x260
[203283.431017]        ret_from_fork+0x27/0x50
[203283.431018]
                other info that might help us debug this:

[203283.431020] Chain exists of:
                  kn->count#82 --> "md_misc" --> (&mddev->del_work)

[203283.431023]  Possible unsafe locking scenario:

[203283.431025]        CPU0                    CPU1
[203283.431026]        ----                    ----
[203283.431027]   lock((&mddev->del_work));
[203283.431028]                                lock("md_misc");
[203283.431030]                                lock((&mddev->del_work));
[203283.431031]   lock(kn->count#82);
[203283.431033]
                 *** DEADLOCK ***

[203283.431035] 2 locks held by kworker/13:1/480:
[203283.431036]  #0:  ("md_misc"){+.+.}, at: [<ffffffffb72b7125>]
process_one_work+0x255/0x740
[203283.431039]  #1:  ((&mddev->del_work)){+.+.}, at:
[<ffffffffb72b7125>] process_one_work+0x255/0x740
[203283.431042]
                stack backtrace:
[203283.431045] CPU: 13 PID: 480 Comm: kworker/13:1 Not tainted
4.14.67-gentoo-debug #3
[203283.431046] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./X470 Taichi Ultimate, BIOS P1.20 04/17/2018
[203283.431048] Workqueue: md_misc mddev_delayed_delete
[203283.431050] Call Trace:
[203283.431053]  dump_stack+0x86/0xcf
[203283.431056]  print_circular_bug.isra.16+0x24c/0x3e0
[203283.431058]  check_prev_add.constprop.24+0x541/0x7f0
[203283.431061]  __lock_acquire+0x14c7/0x1c50
[203283.431064]  lock_acquire+0xa6/0x160
[203283.431066]  ? kernfs_remove+0x1a/0x30
[203283.431068]  __kernfs_remove+0x26d/0x390
[203283.431070]  ? kernfs_remove+0x1a/0x30
[203283.431072]  ? sysfs_remove_dir+0x39/0x90
[203283.431074]  kernfs_remove+0x1a/0x30
[203283.431076]  kobject_del.part.0+0x21/0x70
[203283.431078]  mddev_delayed_delete+0x1f/0x30
[203283.431080]  process_one_work+0x2bc/0x740
[203283.431082]  ? process_one_work+0x255/0x740
[203283.431084]  ? worker_thread+0xef/0x690
[203283.431086]  worker_thread+0x43/0x690
[203283.431088]  ? process_one_work+0x740/0x740
[203283.431090]  kthread+0x18f/0x260
[203283.431092]  ? kthread_create_on_node+0x40/0x40
[203283.431095]  ret_from_fork+0x27/0x50

Regards,
Daniel




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux