Re: bcache deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Am 26.04.2018 um 14:47 schrieb Coly Li:
> On 2018/4/26 2:31 PM, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> today i had the following bcache deadlock - where in the end xfs failed.
>> Is this a known bug? any patch i can apply?
>>
>> INFO: task bcache:335 blocked for more than 120 seconds.
>> Not tainted 4.4.120+555-ph #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> bcache D ffff88103842fd18 0 335 2 0x00080000
>> Workqueue: bcache bch_data_insert_keys [bcache]
>> ffff88103842fd18 00ff88103842fde0 ffff88103b25a580 ffff8801812c0000
>> ffff881038430000 ffff88103842fd88 0000000000000003 ffff88103520cd00
>> ffff881035200000 ffff88103842fd30 ffffffff906e30b5 ffff88103b25a580
>> Call Trace:
>> [<ffffffff906e30b5>] schedule+0x35/0x80
>> [<ffffffffc05c2d6b>] __closure_sync+0x4b/0x2e0 [bcache]
>> [<ffffffffc05b2c5c>] bch_journal+0xec/0x360 [bcache]
>> [<ffffffffc05b422f>] bch_data_insert_keys+0x9f/0xe0 [bcache]
>> [<ffffffff9009eda1>] process_one_work+0x201/0x4a0
>> [<ffffffff9009f73b>] rescuer_thread+0x22b/0x3e0
>> [<ffffffff900a52f8>] kthread+0xd8/0xf0
>> [<ffffffff906e7ac5>] ret_from_fork+0x55/0x80
>> DWARF2 unwinder stuck at ret_from_fork+0x55/0x80
>> Leftover inexact backtrace: [2544108.748799]
>> [<ffffffff900a5220>] ? kthread_park+0x60/0x60
>> INFO: task bcache_writebac:382 blocked for more than 120 seconds.
>> Not tainted 4.4.120+555-ph #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> bcache_writebac D ffff88103440bd08 0 382 2 0x00080000
>> ffff88103440bd08 00ff88103440bd18 ffff88103823cb00 ffff880181548000
>> ffff88103440c000 ffff881035530ac8 ffff881035530ae0 ffff881035530ad0
>> ffff881035530ac8 ffff88103440bd20 ffffffff906e30b5 ffff88103823cb00
>> Call Trace:
>> [<ffffffff906e30b5>] schedule+0x35/0x80
>> [<ffffffff906e5b57>] rwsem_down_write_failed+0x1e7/0x390
>> [<ffffffff903e05d7>] call_rwsem_down_write_failed+0x17/0x30
>> DWARF2 unwinder stuck at call_rwsem_down_write_failed+0x17/0x30
>> Leftover inexact backtrace: [2544108.750407]
>> [<ffffffff906e5254>] down_write+0x24/0x40
>> [<ffffffffc05c2598>] bch_writeback_thread+0x58/0x480 [bcache]
>> [<ffffffffc05c2540>] ? read_dirty+0x500/0x500 [bcache]
>> [<ffffffff900a52f8>] kthread+0xd8/0xf0
>> [<ffffffff900a5220>] ? kthread_park+0x60/0x60
>> [<ffffffff906e7ac5>] ret_from_fork+0x55/0x80
>> [<ffffffff900a5220>] ? kthread_park+0x60/0x60
>> INFO: task bcache_writebac:385 blocked for more than 120 seconds.
>> Not tainted 4.4.120+555-ph #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> bcache_writebac D ffff88102b697d08 0 385 2 0x00080000
>> ffff88102b697d08 00ff88102b697d18 ffff8810384ca580 ffff880181562580
>> ffff88102b698000 ffff881038220ac8 ffff881038220ae0 ffff881038220ad0
>> ffff881038220ac8 ffff88102b697d20 ffffffff906e30b5 ffff8810384ca580
>> Call Trace:
>> [<ffffffff906e30b5>] schedule+0x35/0x80
>> [<ffffffff906e5b57>] rwsem_down_write_failed+0x1e7/0x390
>> [<ffffffff903e05d7>] call_rwsem_down_write_failed+0x17/0x30
>> DWARF2 unwinder stuck at call_rwsem_down_write_failed+0x17/0x30
>> Leftover inexact backtrace: [2544108.752030]
>> [<ffffffff906e5254>] down_write+0x24/0x40
>> [<ffffffffc05c2598>] bch_writeback_thread+0x58/0x480 [bcache]
>> [<ffffffffc05c2540>] ? read_dirty+0x500/0x500 [bcache]
>> [<ffffffff900a52f8>] kthread+0xd8/0xf0
>> [<ffffffff900a5220>] ? kthread_park+0x60/0x60
>> [<ffffffff906e7ac5>] ret_from_fork+0x55/0x80
>> [<ffffffff900a5220>] ? kthread_park+0x60/0x60
>> INFO: task tp_osd_tp:4126 blocked for more than 120 seconds.
>> Not tainted 4.4.120+555-ph #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> tp_osd_tp D ffff880f31eebe38 0 4126 1 0x00080000
>> ffff880f31eebe38 000000000047ffff ffff880f31ebcb00 ffff880181548000
>> ffff880f31eec000 ffffffff90e81564 ffff880f31ebcb00 00000000ffffffff
>> ffffffff90e81568 ffff880f31eebe50 ffffffff906e30b5 ffffffff90e81560
>> Call Trace:
>> [<ffffffff906e30b5>] schedule+0x35/0x80
>> [<ffffffff906e339e>] schedule_preempt_disabled+0xe/0x10
>> [<ffffffff906e4ce5>] __mutex_lock_slowpath+0x95/0x110
>> [<ffffffff906e4d77>] mutex_lock+0x17/0x30
>> [<ffffffff9016bb62>] lru_add_drain_all+0x32/0x180
>> [<ffffffff9019ec6a>] SyS_fadvise64+0x1ca/0x280
>> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6
>> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6
>> Leftover inexact backtrace: [2544108.753700]
>> INFO: task tp_osd_tp:4355 blocked for more than 120 seconds.
>> Not tainted 4.4.120+555-ph #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> tp_osd_tp D ffff881005f6fd38 0 4355 1 0x00080004
>> ffff881005f6fd38 0000000000000000 ffff880efb4acb00 ffff88018154a580
>> ffff881005f70000 ffff881005f6fe90 ffff881005f6fe88 ffff880efb4acb00
>> 000000000000000c ffff881005f6fd50 ffffffff906e30b5 7fffffffffffffff
>> Call Trace:
>> [<ffffffff906e30b5>] schedule+0x35/0x80
>> [<ffffffff906e6396>] schedule_timeout+0x1b6/0x270
>> [<ffffffff906e442f>] wait_for_completion+0x9f/0x110
>> [<ffffffff9009fcdf>] flush_work+0xff/0x1a0
>> [<ffffffff9016bc6c>] lru_add_drain_all+0x13c/0x180
>> [<ffffffff9019ec6a>] SyS_fadvise64+0x1ca/0x280
>> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6
>> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6
>> Leftover inexact backtrace: [2544108.755357]
>> INFO: task tp_fstore_op:4925 blocked for more than 120 seconds.
>> Not tainted 4.4.120+555-ph #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> tp_fstore_op D ffff880e26defdd0 0 4925 1 0x00080004
>> ffff880e26defdd0 00ff88103e354780 ffff880e383fa580 ffff880181548000
>> ffff880e26df0000 ffff88103d40af18 ffff880e26defe8c ffff880e383fa580
>> ffff880fc353b000 ffff880e26defde8 ffffffff906e30b5 00004742000518a8
>> Call Trace:
>> [<ffffffff906e30b5>] schedule+0x35/0x80
>> [<ffffffff903558b6>] _xfs_log_force_lsn+0x196/0x320
>> [<ffffffff903385ec>] __xfs_fsync+0xec/0x1c0
>> [<ffffffff903386d8>] xfs_file_fsync+0x18/0x20
>> [<ffffffff902086de>] vfs_fsync_range+0x4e/0xb0
>> [<ffffffff9020879d>] do_fsync+0x3d/0x70
>> [<ffffffff90208a20>] SyS_fsync+0x10/0x20
>> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6
>> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6
>> Leftover inexact backtrace: [2544108.757028]
>> INFO: task tp_fstore_op:4927 blocked for more than 120 seconds.
>> Not tainted 4.4.120+555-ph #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> tp_fstore_op D ffff880e37867bb8 0 4927 1 0x00080004
>> ffff880e37867bb8 00ff880e37867be0 ffff880e0f0e0000 ffff880181562580
>> ffff880e37868000 ffff88107fd75b80 7fffffffffffffff ffffffff906e3880
>> ffff880e37867d10 ffff880e37867bd0 ffffffff906e30b5 0000000000000000
>> Call Trace:
>> [<ffffffff906e30b5>] schedule+0x35/0x80
>> [<ffffffff906e6396>] schedule_timeout+0x1b6/0x270
>> [<ffffffff906e2854>] io_schedule_timeout+0xa4/0x110
>> [<ffffffff906e389b>] bit_wait_io+0x1b/0x70
>> [<ffffffff906e34cd>] __wait_on_bit+0x5d/0x90
>> [<ffffffff9015a23b>] wait_on_page_bit+0x7b/0x80
>> [<ffffffff9015a324>] __filemap_fdatawait_range+0xe4/0x140
>> [<ffffffff9015a394>] filemap_fdatawait_range+0x14/0x30
>> [<ffffffff9015c1af>] filemap_write_and_wait_range+0x3f/0x70
>> [<ffffffff9033854d>] __xfs_fsync+0x4d/0x1c0
>> [<ffffffff903386d8>] xfs_file_fsync+0x18/0x20
>> [<ffffffff902086de>] vfs_fsync_range+0x4e/0xb0
>> [<ffffffff9020879d>] do_fsync+0x3d/0x70
>> [<ffffffff90208a43>] SyS_fdatasync+0x13/0x20
>> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6
>> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6
>> Leftover inexact backtrace: [2544108.758684]
>> INFO: task tp_osd_tp:4998 blocked for more than 120 seconds.
>> Not tainted 4.4.120+555-ph #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> tp_osd_tp D ffff880dbbb2f7c8 0 4998 1 0x00080004
>> ffff880dbbb2f7c8 00ff880af4b96d80 ffff881007b1cb00 ffff88018151a580
>> ffff880dbbb30000 ffff88102dcf9b30 ffff881007b1cb00 0000000000000001
>> ffff880fc6255000 ffff880dbbb2f7e0 ffffffff906e30b5 7fffffffffffffff
>> Call Trace:
>> [<ffffffff906e30b5>] schedule+0x35/0x80
>> [<ffffffff906e6396>] schedule_timeout+0x1b6/0x270
>> [<ffffffff906e4ecf>] __down+0x6f/0xb0
>> [<ffffffff900cfce1>] down+0x41/0x50
>> [<ffffffff90332fdc>] xfs_buf_lock+0x3c/0x100
>> [<ffffffff90333215>] _xfs_buf_find+0x175/0x3b0
>> [<ffffffff9033347a>] xfs_buf_get_map+0x2a/0x270
>> [<ffffffff9033420d>] xfs_buf_read_map+0x2d/0x180
>> [<ffffffff903616ea>] xfs_trans_read_buf_map+0xfa/0x340
>> [<ffffffff90313260>] xfs_da_read_buf+0xb0/0xe0
>> [<ffffffff90319d20>] xfs_dir3_leaf_read.constprop.15+0x30/0x60
>> [<ffffffff9031b481>] xfs_dir2_leaf_lookup_int+0x51/0x290
>> [<ffffffff9031b6f0>] xfs_dir2_leaf_lookup+0x30/0x140
>> [<ffffffff90316607>] xfs_dir_lookup+0x177/0x190
>> [<ffffffff9034534a>] xfs_lookup+0x5a/0x110
>> [<ffffffff903420d8>] xfs_vn_lookup+0x58/0xa0
>> [<ffffffff901dfc93>] lookup_real+0x23/0x60
>> [<ffffffff901e16e3>] __lookup_hash+0x33/0x40
>> [<ffffffff901e2aca>] walk_component+0x20a/0x2e0
>> [<ffffffff901e3b87>] path_lookupat+0x67/0x120
>> [<ffffffff901e53ca>] filename_lookup+0x9a/0x110
>> [<ffffffff901e54f6>] user_path_at_empty+0x36/0x40
>> [<ffffffff901db533>] vfs_fstatat+0x53/0xa0
>> [<ffffffff901db9af>] SYSC_newstat+0x1f/0x40
>> [<ffffffff901dbaae>] SyS_newstat+0xe/0x10
>> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6
>> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6
>> Leftover inexact backtrace: [2544108.760372]
>> INFO: task xfsaild/bcache0:4209 blocked for more than 120 seconds.
>> Not tainted 4.4.120+555-ph #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> xfsaild/bcache0 D ffff880f21713d40 0 4209 2 0x00080000
>> ffff880f21713d40 00ff88101d607000 ffff881034bc0000 ffff880181530000
>> ffff880f21714000 ffff881034bc0000 ffff88101d607000 ffff88101d607128
>> ffff880ff00a3c00 ffff880f21713d58 ffffffff906e30b5 0000000000000000
>> Call Trace:
>> [<ffffffff906e30b5>] schedule+0x35/0x80
>> [<ffffffff903554f7>] _xfs_log_force+0x137/0x280
>> [<ffffffff90355666>] xfs_log_force+0x26/0x90
>> [<ffffffff90360445>] xfsaild+0x155/0x6b0
>> [<ffffffff900a52f8>] kthread+0xd8/0xf0
>> [<ffffffff906e7ac5>] ret_from_fork+0x55/0x80
>> DWARF2 unwinder stuck at ret_from_fork+0x55/0x80
>> Leftover inexact backtrace: [2544108.762015]
>> [<ffffffff900a5220>] ? kthread_park+0x60/0x60
>> INFO: task tp_fstore_op:4909 blocked for more than 120 seconds.
>> Not tainted 4.4.120+555-ph #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> tp_fstore_op D ffff880e10e17888 0 4909 1 0x00080004
>> ffff880e10e17888 00ff880e10e17878 ffff880eda1dcb00 ffffffff90e134c0
>> ffff880e10e18000 ffff88015609ac30 ffff880eda1dcb00 0000000000000001
>> ffff88102a45d300 ffff880e10e178a0 ffffffff906e30b5 7fffffffffffffff
>> Call Trace:
>> [<ffffffff906e30b5>] schedule+0x35/0x80
>> [<ffffffff906e6396>] schedule_timeout+0x1b6/0x270
>> [<ffffffff906e4ecf>] __down+0x6f/0xb0
>> [<ffffffff900cfce1>] down+0x41/0x50
>> [<ffffffff90332fdc>] xfs_buf_lock+0x3c/0x100
>> [<ffffffff90333215>] _xfs_buf_find+0x175/0x3b0
>> [<ffffffff9033347a>] xfs_buf_get_map+0x2a/0x270
>> [<ffffffff9033420d>] xfs_buf_read_map+0x2d/0x180
>> [<ffffffff903616ea>] xfs_trans_read_buf_map+0xfa/0x340
>> [<ffffffff90313260>] xfs_da_read_buf+0xb0/0xe0
>> [<ffffffff902fa379>] xfs_attr3_leaf_read+0x29/0x60
>> [<ffffffff902f8464>] xfs_attr_leaf_addname+0x44/0x3f0
>> [<ffffffff902f907c>] xfs_attr_set+0x1fc/0x4d0
>> [<ffffffff9035198c>] xfs_xattr_set+0x5c/0xa0
>> [<ffffffff901fa567>] generic_setxattr+0x67/0x80
>> [<ffffffff901fae52>] __vfs_setxattr_noperm+0x52/0xd0
>> [<ffffffff901faf77>] vfs_setxattr+0xa7/0xb0
>> [<ffffffff901fb0a8>] setxattr+0x128/0x200
>> [<ffffffff901fb47d>] SyS_fsetxattr+0x7d/0xa0
>> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6
>> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6
>> Leftover inexact backtrace: [2544108.763689]
> 
> Hi Stefan,
> 
> Is it 4.4 stable tree, or you also have your own patch ? Yes we do have
> some fixes on journal recently, but I am not sure whether your kernel
> has these patches.

sorry i missed the list of patches. The following patches are applied on
top of the vanilla 4.4.120 kernel:
Subject: [PATCH 04/11] mbcache: remove mbcache
Subject: [PATCH 09/11] mbcache: get rid of _e_hash_list_head
Subject: [PATCH 11/11] mbcache: add reusable flag to cache entries
Subject: bcache: switch to using blk_queue_write_cache()
Subject: bcache: use op_is_write instead of checking for REQ_WRITE
Subject: bcache: use bio op accessors
Subject: bcache: use op_is_sync to check for synchronous requests
Subject: bcache: debug: avoid accessing .bi_io_vec directly
Subject: [PATCH] bcache: Remove deprecated create_workqueue
Subject: [PATCH] bcache: Remove redundant parameter for cache_alloc()
Subject: [PATCH] bcache: update document info
Subject: [PATCH] bcache: Remove redundant block_size assignment
Subject: [PATCH] bcache: register_bcache(): call blkdev_put() when
Subject: [PATCH] bcache: pr_err: more meaningful error message when
nr_stripes
Subject: [PATCH] bcache: bch_writeback_thread() is not freezable
Subject: [PATCH] bcache: bch_allocator_thread() is not freezable
Subject: [PATCH] bcache: documentation updates and corrections
Subject: [PATCH] bcache: documentation formatting, edited for clarity,
stripe
Subject: [PATCH] bcache: partition support: add 16 minors per bcacheN device
Subject: [PATCH] bcache: use kmalloc to allocate bio in bch_data_verify()
Subject: [PATCH] bcache: fix sequential large write IO bypass
Subject: [PATCH] bcache: Don't reinvent the wheel but use existing llist API
Subject: [PATCH] bcache: gc does not work when triggering by manual command
Subject: [PATCH] bcache: increase the number of open buckets
Subject: [PATCH] bcache: silence static checker warning
Subject: [PATCH] bcache: Update continue_at() documentation
Subject: [PATCH] bcache: use llist_for_each_entry_safe() in
Subject: [PATCH] bcache: Avoid nested function definition
Subject: [PATCH] bcache: fix a comments typo in bch_alloc_sectors()
Subject: [PATCH] bcache: rewrite multiple partitions support
Subject: [PATCH] bcache: Remove redundant set_capacity
Subject: [PATCH] bcache: update bio->bi_opf bypass/writeback REQ_ flag hints
Subject: [PATCH] bcache: remove unused parameter
Subject: [PATCH] bcache: don't write back data if reading it failed
Subject: [PATCH] bcache: implement PI controller for writeback rate
Subject: [PATCH] bcache: smooth writeback rate control
Subject: [PATCH] bcache: writeback rate shouldn't artifically clamp
Subject: [PATCH] bcache: rearrange writeback main thread ratelimit
Subject: [PATCH] bcache: safeguard a dangerous addressing in closure_queue
Subject: [PATCH] bcache: writeback rate clamping: make 32 bit safe
Subject: [PATCH] bcache: update bucket_in_use in real time
Subject: [PATCH] bcache: add a comment in journal bucket reading
Subject: [PATCH 03/19] bcache: fix wrong return value in bch_debug_init()
Subject: [PATCH 04/19] bcache: writeback: properly order backing device IO
Subject: [PATCH 06/19] bcache: Fix, improve efficiency of closure_sync()
Subject: [PATCH 07/19] bcache: mark closure_sync() __sched
Subject: [PATCH 08/19] bcache: reduce cache_set devices iteration by
Subject: [PATCH 09/19] bcache: fix writeback target calc on large devices
Subject: [PATCH 10/19] bcache: closures: move control bits one bit right
Subject: [PATCH 11/19] bcache: add journal statistic
Subject: [PATCH 12/19] bcache: fix high CPU occupancy during journal
Subject: [PATCH 13/19] bcache: properly set task state in
Subject: [PATCH 14/19] bcache: set error_limit correctly
Subject: [PATCH 15/19] bcache: fix for allocator and register thread race
Subject: [PATCH 16/19] bcache: set writeback_rate_update_seconds in
range [1,
Subject: [PATCH 17/19] bcache: return attach error when no cache set exist
Subject: [PATCH 18/19] bcache: fix for data collapse after re-attaching an
Subject: [PATCH 01/19] bcache: stop writeback thread after detaching
Subject: [PATCH 02/19] bcache: segregate flash only volume write streams
Subject: [PATCH 05/19] bcache: allow quick writeback when backing idle

> Thanks.
> 
> Coly Li
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux