On Mon, 4 Apr 2016, Eric Wheeler wrote: > Hello all, > > I have a test system that won't register. I can register either the > caching dev or the cached dev fine, but as soon as I register the > sencond dev, bash hangs when echoing into /sys/fs/bcache/register . > I can register in either order (cache first or cached dev first) and > the deadlock still presents. > > I've narrowed down the problem to these two call stacks: > > > == The allocator thread is one half of the deadlock: > > [ 405.619895] INFO: task bcache_allocato:3494 blocked for more than 5 seconds. > [ 405.620897] Tainted: G W O 4.1.20+ #5 > [ 405.621732] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 405.623132] bcache_allocato D ffff88007b78fc98 0 3494 2 0x00000080 > [ 405.624406] ffff88007b78fc98 ffff88007b78fc68 0000000000000002 ffff88007c8ddb80 > [ 405.626241] ffff8800796e0000 ffff88007b78fc78 ffff88007b790000 0000000000000005 > [ 405.627890] ffff880079cf0028 0000000000000001 0000000000000001 ffff88007b78fcb8 > [ 405.629730] Call Trace: > [ 405.630362] [<ffffffff817a9f47>] schedule+0x37/0x90 > [ 405.631230] [<ffffffffa048e4f0>] bch_bucket_alloc+0x1b0/0x670 [bcache] > [ 405.632261] [<ffffffff81103860>] ? prepare_to_wait_event+0x110/0x110 > [ 405.633274] [<ffffffffa04a90d5>] bch_prio_write+0x1b5/0x390 [bcache] > [ 405.634362] [<ffffffffa048e19d>] bch_allocator_thread+0x31d/0x4c0 [bcache] > [ 405.635496] [<ffffffffa048de80>] ? invalidate_buckets+0x980/0x980 [bcache] > [ 405.636655] [<ffffffff810d734e>] kthread+0xfe/0x120 > [ 405.637604] [<ffffffff817b0440>] ? _raw_spin_unlock_irq+0x30/0x50 > [ 405.638569] [<ffffffff810d7250>] ? kthread_create_on_node+0x240/0x240 > [ 405.639628] [<ffffffff817b11a2>] ret_from_fork+0x42/0x70 > [ 405.640570] [<ffffffff810d7250>] ? kthread_create_on_node+0x240/0x240 > [ 405.641568] no locks held by bcache_allocato/3494. > > There is a comment in the code inside of bch_allocator_thread() before it > calls bch_prio_write(ca): > 360 /* > 361 * This could deadlock if an allocation with a btree > 362 * node locked ever blocked - having the btree node > 363 * locked would block garbage collection, but here we're > 364 * waiting on garbage collection before we invalidate > 365 * and free anything. > 366 * > 367 * But this should be safe since the btree code always > 368 * uses btree_check_reserve() before allocating now, and > 369 * if it fails it blocks without btree nodes locked. > 370 */ > 371 if (!fifo_full(&ca->free_inc)) > 372 goto retry_invalidate; > 373 > 374 bch_prio_write(ca); > > I think I'm hitting the deadlock which this comment speaks to. > > > == This is the other side of the deadlock caused by registering the cache device: > echo /dev/sdb > /sys/fs/bcache/register > > [ 405.578073] INFO: task bash:3490 blocked for more than 5 seconds: > [ 405.580255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ ... ] > [ 405.588984] [<ffffffff817a9f47>] schedule+0x37/0x90 > [ 405.589986] [<ffffffffa048e4f0>] bch_bucket_alloc+0x1b0/0x670 [bcache] > [ 405.591084] [<ffffffff81103860>] ? prepare_to_wait_event+0x110/0x110 > [ 405.592078] [<ffffffffa048eb59>] __bch_bucket_alloc_set+0x109/0x1a0 [bcache] > [ 405.593113] [<ffffffffa048ec40>] bch_bucket_alloc_set+0x50/0x70 [bcache] > [ 405.594153] [<ffffffffa04a7927>] __uuid_write+0x67/0x160 [bcache] > [ 405.595223] [<ffffffffa04a8a06>] bch_uuid_write+0x16/0x40 [bcache] > [ 405.596273] [<ffffffffa04a9877>] bch_cached_dev_attach+0x157/0x490 [bcache] > [ 405.597384] [<ffffffffa04a6b68>] ? __write_super+0x148/0x180 [bcache] > [ 405.598432] [<ffffffffa04a8986>] ? bcache_write_super+0x1d6/0x240 [bcache] > [ 405.599464] [<ffffffffa04aa761>] run_cache_set+0x601/0x910 [bcache] > [ 405.600548] [<ffffffffa04ac0ce>] register_bcache+0xeae/0x1430 [bcache] > [...] > [ 405.611623] 4 locks held by bash/3490: > [ 405.612342] #0: (sb_writers#3){.+.+.+}, at: [<ffffffff8126f773>] vfs_write+0x183/0x1b0 > [ 405.614200] #1: (&of->mutex){+.+.+.}, at: [<ffffffff812faaa6>] kernfs_fop_write+0x66/0x1a0 > [ 405.615994] #2: (s_active#194){.+.+.+}, at: [<ffffffff812faaae>] kernfs_fop_write+0x6e/0x1a0 > [ 405.617925] #3: (&bch_register_lock){+.+.+.}, at: [<ffffffffa04abe70>] register_bcache+0xc50/0x1430 [bcache] > > > I'm rather perplexed as to why this is deadlocking because > bch_bucket_alloc_set() locks c->bucket_lock, and the allocator holds > ca->set->bucket_lock before calling bch_prio_write() . I checked, > this is the same lock (same memory position). > > This implies that the allocator waits on bch_bucket_alloc_set() > which was invoked by bash through register_bcache, and register_bcache > waits on the allocator's call to bch_prio_write(). > > Things that I've tried which don't work or make the problem worse: > * Adding a mutex inside bch_bucket_alloc so that only one may proceed at a time > * Holding bch_register_lock in the allocator thread before calling bch_prio_write. > > Does anyone else have insight here that might help solve the problem? > Also noteworthy, I get this trace immediately after registering the SSD cache device, though no deadlock at this point: [ 28.753548] random: nonblocking pool is initialized [ 72.792018] ------------[ cut here ]------------ [ 72.792607] WARNING: CPU: 0 PID: 2430 at kernel/locking/lockdep.c:728 __lock_acquire+0x129a/0x15d0() [ 72.793009] Modules linked in: bcache(O) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter netconsole bridge stp llc kvm_intel kvm nfsd sg ppdev virtio_balloon i2c_piix4 parport_pc auth_rpcgss parport pcspkr acpi_cpufreq nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi syscopyarea sysfillrect sysimgblt drm_kms_helper ttm crc32c_intel drm serio_raw i2c_core ata_piix libata virtio_net floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: bcache] [ 72.793009] CPU: 0 PID: 2430 Comm: bash Tainted: G O 4.1.20+ #5 [ 72.793009] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 72.793009] 0000000000000086 0000000096bd7654 ffff880078e178f8 ffffffff817a5f78 [ 72.793009] 0000000000000000 ffffffff81a1898b ffff880078e17938 ffffffff810af92a [ 72.793009] ffff88007b38c4a0 ffff88007b38c4a0 0000000000000000 ffff88007a7380c0 [ 72.793009] Call Trace: [ 72.793009] [<ffffffff817a5f78>] dump_stack+0x85/0xbe [ 72.793009] [<ffffffff810af92a>] warn_slowpath_common+0x8a/0xc0 [ 72.793009] [<ffffffff810afa5a>] warn_slowpath_null+0x1a/0x20 [ 72.793009] [<ffffffff8110f6ea>] __lock_acquire+0x129a/0x15d0 [ 72.793009] [<ffffffff8110fb82>] lock_acquire+0xd2/0x300 [ 72.793009] [<ffffffffa04fa441>] ? mca_reap+0x61/0x1a0 [bcache] [ 72.793009] [<ffffffff811090ff>] down_write_trylock+0x6f/0x90 [ 72.793009] [<ffffffffa04fa441>] ? mca_reap+0x61/0x1a0 [bcache] [ 72.793009] [<ffffffffa04fa441>] mca_reap+0x61/0x1a0 [bcache] [ 72.793009] [<ffffffffa04f8ab6>] ? mca_find+0xe6/0x190 [bcache] [ 72.793009] [<ffffffffa04f89d5>] ? mca_find+0x5/0x190 [bcache] [ 72.793009] [<ffffffffa04fa8e7>] mca_alloc+0xc7/0x640 [bcache] [ 72.793009] [<ffffffffa04f89d5>] ? mca_find+0x5/0x190 [bcache] [ 72.793009] [<ffffffffa04fb3e2>] bch_btree_node_get+0x152/0x2b0 [bcache] [ 72.793009] [<ffffffffa0517cee>] ? bch_crc64+0x3e/0x50 [bcache] [ 72.793009] [<ffffffffa050f402>] run_cache_set+0x272/0x910 [bcache] [ 72.793009] [<ffffffffa05110fe>] register_bcache+0xeae/0x1430 [bcache] [ 72.793009] [<ffffffff8121a0ff>] ? might_fault+0x5f/0xb0 [ 72.793009] [<ffffffff8110fe50>] ? lock_release_non_nested+0xa0/0x300 [ 72.793009] [<ffffffff813b214f>] kobj_attr_store+0xf/0x20 [ 72.793009] [<ffffffff812fb658>] sysfs_kf_write+0x48/0x60 [ 72.793009] [<ffffffff812fab83>] kernfs_fop_write+0x143/0x1a0 [ 72.793009] [<ffffffff8126ef67>] __vfs_write+0x37/0x110 [ 72.793009] [<ffffffff8126f773>] ? vfs_write+0x183/0x1b0 [ 72.793009] [<ffffffff8132bca3>] ? security_file_permission+0x23/0xa0 [ 72.793009] [<ffffffff8126f699>] vfs_write+0xa9/0x1b0 [ 72.793009] [<ffffffff81270558>] SyS_write+0x58/0xd0 [ 72.793009] [<ffffffff817b0d6e>] system_call_fastpath+0x12/0x76 [ 72.793009] ---[ end trace 43d86c2e9eb293e9 ]--- [ 72.845818] bcache: bch_journal_replay() journal replay done, 1240 keys in 2 entries, seq 5 [ 72.847503] bcache: register_cache() registered cache device sdb -- Eric Wheeler > > -- > Eric Wheeler > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html