Re: bcache register deadlock in bcache_bucket_alloc() between bch_uuid_write and bch_prio_write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 11 Feb 2017, Rui Hua wrote:

> Hi Eric,
> Does this problem fixed?
> I'm hitting a deadlock which is similar to yours: I backport bcache
> code from 4.4.24 to kernel 3.10.0-327.el7.x86_64(CentOS7.2), It has
> been working fine many days (writeback mode, the caching device is a
> Intel DC S3700 200G, attached 5 HDD disks).

Maybe try this patch:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=2d64cbc819b13ebd503780f39552827516f4ce4a

Its the only one I can think of.  However, it might already be in 4.4.24.  
You would need to check and see if you have this or not.


--
Eric Wheeler


> Today, I reboot the system  when running some XFS benchmarks over all
> cachedevs, After the system is startup, the task of register
> command(echo /dev/sdg > /sys/fs/bcache/register) hang, and the dmesg
> info as follow (/dev/sdg is my S3700 SSD):
> 
> [Feb10 20:31] INFO: task bash:2615 blocked for more than 120 seconds.
> [  +0.000059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  +0.000059] bash            D ffff88022f86a060     0  2615   2611 0x00000080
> [  +0.000007]  ffff88022e833b28 0000000000000082 ffff88022f985c00
> ffff88022e833fd8
> [  +0.000006]  ffff88022e833fd8 ffff88022e833fd8 ffff88022f985c00
> ffff88022ddc0000
> [  +0.000004]  00000000fffffffc ffff88022f86a000 ffff88022e833bb0
> ffff88022f86a060
> [  +0.000005] Call Trace:
> [  +0.000013]  [<ffffffff8163a909>] schedule+0x29/0x70
> [  +0.000016]  [<ffffffffa045701d>] __bch_btree_map_nodes+0x12d/0x150 [bcache]
> [  +0.000010]  [<ffffffffa0456700>] ?
> bch_btree_insert_check_key+0x1d0/0x1d0 [bcache]
> [  +0.000010]  [<ffffffffa0457131>] bch_btree_insert+0xf1/0x170 [bcache]
> [  +0.000006]  [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
> [  +0.000011]  [<ffffffffa045bbda>] bch_journal_replay+0x14a/0x260 [bcache]
> [  +0.000007]  [<ffffffff8101cd69>] ? sched_clock+0x9/0x10
> [  +0.000008]  [<ffffffff810b8a66>] ? try_to_wake_up+0x1b6/0x300
> [  +0.000013]  [<ffffffffa04673dd>] run_cache_set+0x56d/0x950 [bcache]
> [  +0.000012]  [<ffffffffa0468ec3>] register_bcache+0xde3/0x1370 [bcache]
> [  +0.000008]  [<ffffffff812f4d0f>] kobj_attr_store+0xf/0x20
> [  +0.000005]  [<ffffffff81259176>] sysfs_write_file+0xc6/0x140
> [  +0.000006]  [<ffffffff811de5cd>] vfs_write+0xbd/0x1e0
> [  +0.000003]  [<ffffffff811df06f>] SyS_write+0x7f/0xe0
> [  +0.000006]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b
> [  +0.000006] INFO: task bcache_allocato:2676 blocked for more than 120 seconds.
> [  +0.000054] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  +0.000057] bcache_allocato D 0000000000000001     0  2676      2 0x00000080
> [  +0.000004]  ffff880220edfc78 0000000000000046 ffff8800b8ed0b80
> ffff880220edffd8
> [  +0.000005]  ffff880220edffd8 ffff880220edffd8 ffff8800b8ed0b80
> ffff880220edfce0
> [  +0.000004]  ffff8800b8ed0b80 ffff88022ddd8870 ffff8800b9cae320
> 0000000000000001
> [  +0.000004] Call Trace:
> [  +0.000005]  [<ffffffff8163a909>] schedule+0x29/0x70
> [  +0.000010]  [<ffffffffa0457c05>] closure_sync+0x25/0x90 [bcache]
> [  +0.000011]  [<ffffffffa045c4b8>] bch_journal+0x128/0x340 [bcache]
> [  +0.000009]  [<ffffffffa045c717>] bch_journal_meta+0x47/0x70 [bcache]
> [  +0.000005]  [<ffffffff8163a909>] ? schedule+0x29/0x70
> [  +0.000009]  [<ffffffffa0457c05>] ? closure_sync+0x25/0x90 [bcache]
> [  +0.000011]  [<ffffffffa0462f5b>] ? prio_io+0xfb/0x170 [bcache]
> [  +0.000004]  [<ffffffff81638b42>] ? mutex_lock+0x12/0x2f
> [  +0.000011]  [<ffffffffa0465407>] bch_prio_write+0x237/0x340 [bcache]
> [  +0.000008]  [<ffffffffa044ccf0>] ? invalidate_buckets+0x9c0/0x9c0 [bcache]
> [  +0.000009]  [<ffffffffa044d0b8>] bch_allocator_thread+0x3c8/0x3d0 [bcache]
> [  +0.000008]  [<ffffffffa044ccf0>] ? invalidate_buckets+0x9c0/0x9c0 [bcache]
> [  +0.000007]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
> [  +0.000005]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
> [  +0.000005]  [<ffffffff81645858>] ret_from_fork+0x58/0x90
> [  +0.000006]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
> 
> The trace info as fallow:
> [root@localhost ~]# perf record -e 'bcache:*' -o /tmp/perf.data -a
> ^C[ perf record: Woken up 1 times to write data ]
> [bcache] with build id 4ce18096008788e940370b93dcb21b5711b7bed6 not
> found, continuing without symbols
> [ perf record: Captured and wrote 0.207 MB /tmp/perf.data (675 samples) ]
> [root@localhost ~]#
> [root@localhost ~]# cd /tmp/
> [root@localhost tmp]# perf script
>             bash  2615 [004]    97.000138: bcache:bcache_btree_read:
> bucket 186634
>             .......
>             bash  2615 [004]    98.708044: bcache:bcache_btree_read:
> bucket 122988
>             bash  2615 [004]    98.711088: bcache:bcache_btree_read:
> bucket 270990
>             bash  2615 [004]    98.741592:
> bcache:bcache_journal_replay_key: 3:978301904 len 16 dirty 1
>         kthreadd  2676 [002]    98.756234: bcache:bcache_alloc:
> allocated 8,98 sector=1024
>         kthreadd  2676 [002]    98.760511: bcache:bcache_alloc:
> allocated 8,98 sector=1024
>         kthreadd  2676 [002]    98.762685: bcache:bcache_journal_full: (nil)U
> 
> ====
> Looks like deadlock between the SSD register command and
> bcache_allocator thread? The trace info tell us the journal is full,
> so the allocator thread waits on
> bch_prio_write()->prio_io()->bch_journal_meta(), but there is no
> RESERVE_BTREE buckets to use for journal replay at this time, so
> register waits on bch_journal_replay()->bch_btree_insert()
> 
> 2016-04-04 9:07 GMT+08:00 Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx>:
> > Hello all,
> >
> > I have a test system that won't register.  I can register either the
> > caching dev or the cached dev fine, but as soon as I register the
> > sencond dev, bash hangs when echoing into /sys/fs/bcache/register .
> > I can register in either order (cache first or cached dev first) and
> > the deadlock still presents.
> >
> > I've narrowed down the problem to these two call stacks:
> >
> >
> > == The allocator thread is one half of the deadlock:
> >
> > [  405.619895] INFO: task bcache_allocato:3494 blocked for more than 5 seconds.
> > [  405.620897]       Tainted: G        W  O    4.1.20+ #5
> > [  405.621732] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  405.623132] bcache_allocato D ffff88007b78fc98     0  3494      2 0x00000080
> > [  405.624406]  ffff88007b78fc98 ffff88007b78fc68 0000000000000002 ffff88007c8ddb80
> > [  405.626241]  ffff8800796e0000 ffff88007b78fc78 ffff88007b790000 0000000000000005
> > [  405.627890]  ffff880079cf0028 0000000000000001 0000000000000001 ffff88007b78fcb8
> > [  405.629730] Call Trace:
> > [  405.630362]  [<ffffffff817a9f47>] schedule+0x37/0x90
> > [  405.631230]  [<ffffffffa048e4f0>] bch_bucket_alloc+0x1b0/0x670 [bcache]
> > [  405.632261]  [<ffffffff81103860>] ? prepare_to_wait_event+0x110/0x110
> > [  405.633274]  [<ffffffffa04a90d5>] bch_prio_write+0x1b5/0x390 [bcache]
> > [  405.634362]  [<ffffffffa048e19d>] bch_allocator_thread+0x31d/0x4c0 [bcache]
> > [  405.635496]  [<ffffffffa048de80>] ? invalidate_buckets+0x980/0x980 [bcache]
> > [  405.636655]  [<ffffffff810d734e>] kthread+0xfe/0x120
> > [  405.637604]  [<ffffffff817b0440>] ? _raw_spin_unlock_irq+0x30/0x50
> > [  405.638569]  [<ffffffff810d7250>] ? kthread_create_on_node+0x240/0x240
> > [  405.639628]  [<ffffffff817b11a2>] ret_from_fork+0x42/0x70
> > [  405.640570]  [<ffffffff810d7250>] ? kthread_create_on_node+0x240/0x240
> > [  405.641568] no locks held by bcache_allocato/3494.
> >
> > There is a comment in the code inside of bch_allocator_thread() before it
> > calls bch_prio_write(ca):
> >         360  /*
> >         361   * This could deadlock if an allocation with a btree
> >         362   * node locked ever blocked - having the btree node
> >         363   * locked would block garbage collection, but here we're
> >         364   * waiting on garbage collection before we invalidate
> >         365   * and free anything.
> >         366   *
> >         367   * But this should be safe since the btree code always
> >         368   * uses btree_check_reserve() before allocating now, and
> >         369   * if it fails it blocks without btree nodes locked.
> >         370   */
> >         371   if (!fifo_full(&ca->free_inc))
> >         372           goto retry_invalidate;
> >         373
> >         374   bch_prio_write(ca);
> >
> > I think I'm hitting the deadlock which this comment speaks to.
> >
> >
> > == This is the other side of the deadlock caused by registering the cache device:
> >   echo /dev/sdb > /sys/fs/bcache/register
> >
> > [  405.578073] INFO: task bash:3490 blocked for more than 5 seconds:
> > [  405.580255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ ... ]
> > [  405.588984]  [<ffffffff817a9f47>] schedule+0x37/0x90
> > [  405.589986]  [<ffffffffa048e4f0>] bch_bucket_alloc+0x1b0/0x670 [bcache]
> > [  405.591084]  [<ffffffff81103860>] ? prepare_to_wait_event+0x110/0x110
> > [  405.592078]  [<ffffffffa048eb59>] __bch_bucket_alloc_set+0x109/0x1a0 [bcache]
> > [  405.593113]  [<ffffffffa048ec40>] bch_bucket_alloc_set+0x50/0x70 [bcache]
> > [  405.594153]  [<ffffffffa04a7927>] __uuid_write+0x67/0x160 [bcache]
> > [  405.595223]  [<ffffffffa04a8a06>] bch_uuid_write+0x16/0x40 [bcache]
> > [  405.596273]  [<ffffffffa04a9877>] bch_cached_dev_attach+0x157/0x490 [bcache]
> > [  405.597384]  [<ffffffffa04a6b68>] ? __write_super+0x148/0x180 [bcache]
> > [  405.598432]  [<ffffffffa04a8986>] ? bcache_write_super+0x1d6/0x240 [bcache]
> > [  405.599464]  [<ffffffffa04aa761>] run_cache_set+0x601/0x910 [bcache]
> > [  405.600548]  [<ffffffffa04ac0ce>] register_bcache+0xeae/0x1430 [bcache]
> > [...]
> > [  405.611623] 4 locks held by bash/3490:
> > [  405.612342]  #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff8126f773>] vfs_write+0x183/0x1b0
> > [  405.614200]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff812faaa6>] kernfs_fop_write+0x66/0x1a0
> > [  405.615994]  #2:  (s_active#194){.+.+.+}, at: [<ffffffff812faaae>] kernfs_fop_write+0x6e/0x1a0
> > [  405.617925]  #3:  (&bch_register_lock){+.+.+.}, at: [<ffffffffa04abe70>] register_bcache+0xc50/0x1430 [bcache]
> >
> >
> > I'm rather perplexed as to why this is deadlocking because
> > bch_bucket_alloc_set() locks c->bucket_lock, and the allocator holds
> > ca->set->bucket_lock before calling bch_prio_write() .  I checked,
> > this is the same lock (same memory position).
> >
> > This implies that the allocator waits on bch_bucket_alloc_set()
> > which was invoked by bash through register_bcache, and register_bcache
> > waits on the allocator's call to bch_prio_write().
> >
> > Things that I've tried which don't work or make the problem worse:
> >   * Adding a mutex inside bch_bucket_alloc so that only one may proceed at a time
> >   * Holding bch_register_lock in the allocator thread before calling bch_prio_write.
> >
> > Does anyone else have insight here that might help solve the problem?
> >
> > -Eric
> >
> > --
> > Eric Wheeler
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux