Hello, Am 26.04.2018 um 14:47 schrieb Coly Li: > On 2018/4/26 2:31 PM, Stefan Priebe - Profihost AG wrote: >> Hello, >> >> today i had the following bcache deadlock - where in the end xfs failed. >> Is this a known bug? any patch i can apply? >> >> INFO: task bcache:335 blocked for more than 120 seconds. >> Not tainted 4.4.120+555-ph #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> bcache D ffff88103842fd18 0 335 2 0x00080000 >> Workqueue: bcache bch_data_insert_keys [bcache] >> ffff88103842fd18 00ff88103842fde0 ffff88103b25a580 ffff8801812c0000 >> ffff881038430000 ffff88103842fd88 0000000000000003 ffff88103520cd00 >> ffff881035200000 ffff88103842fd30 ffffffff906e30b5 ffff88103b25a580 >> Call Trace: >> [<ffffffff906e30b5>] schedule+0x35/0x80 >> [<ffffffffc05c2d6b>] __closure_sync+0x4b/0x2e0 [bcache] >> [<ffffffffc05b2c5c>] bch_journal+0xec/0x360 [bcache] >> [<ffffffffc05b422f>] bch_data_insert_keys+0x9f/0xe0 [bcache] >> [<ffffffff9009eda1>] process_one_work+0x201/0x4a0 >> [<ffffffff9009f73b>] rescuer_thread+0x22b/0x3e0 >> [<ffffffff900a52f8>] kthread+0xd8/0xf0 >> [<ffffffff906e7ac5>] ret_from_fork+0x55/0x80 >> DWARF2 unwinder stuck at ret_from_fork+0x55/0x80 >> Leftover inexact backtrace: [2544108.748799] >> [<ffffffff900a5220>] ? kthread_park+0x60/0x60 >> INFO: task bcache_writebac:382 blocked for more than 120 seconds. >> Not tainted 4.4.120+555-ph #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> bcache_writebac D ffff88103440bd08 0 382 2 0x00080000 >> ffff88103440bd08 00ff88103440bd18 ffff88103823cb00 ffff880181548000 >> ffff88103440c000 ffff881035530ac8 ffff881035530ae0 ffff881035530ad0 >> ffff881035530ac8 ffff88103440bd20 ffffffff906e30b5 ffff88103823cb00 >> Call Trace: >> [<ffffffff906e30b5>] schedule+0x35/0x80 >> [<ffffffff906e5b57>] rwsem_down_write_failed+0x1e7/0x390 >> [<ffffffff903e05d7>] call_rwsem_down_write_failed+0x17/0x30 >> DWARF2 unwinder stuck at call_rwsem_down_write_failed+0x17/0x30 >> Leftover inexact backtrace: [2544108.750407] >> [<ffffffff906e5254>] down_write+0x24/0x40 >> [<ffffffffc05c2598>] bch_writeback_thread+0x58/0x480 [bcache] >> [<ffffffffc05c2540>] ? read_dirty+0x500/0x500 [bcache] >> [<ffffffff900a52f8>] kthread+0xd8/0xf0 >> [<ffffffff900a5220>] ? kthread_park+0x60/0x60 >> [<ffffffff906e7ac5>] ret_from_fork+0x55/0x80 >> [<ffffffff900a5220>] ? kthread_park+0x60/0x60 >> INFO: task bcache_writebac:385 blocked for more than 120 seconds. >> Not tainted 4.4.120+555-ph #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> bcache_writebac D ffff88102b697d08 0 385 2 0x00080000 >> ffff88102b697d08 00ff88102b697d18 ffff8810384ca580 ffff880181562580 >> ffff88102b698000 ffff881038220ac8 ffff881038220ae0 ffff881038220ad0 >> ffff881038220ac8 ffff88102b697d20 ffffffff906e30b5 ffff8810384ca580 >> Call Trace: >> [<ffffffff906e30b5>] schedule+0x35/0x80 >> [<ffffffff906e5b57>] rwsem_down_write_failed+0x1e7/0x390 >> [<ffffffff903e05d7>] call_rwsem_down_write_failed+0x17/0x30 >> DWARF2 unwinder stuck at call_rwsem_down_write_failed+0x17/0x30 >> Leftover inexact backtrace: [2544108.752030] >> [<ffffffff906e5254>] down_write+0x24/0x40 >> [<ffffffffc05c2598>] bch_writeback_thread+0x58/0x480 [bcache] >> [<ffffffffc05c2540>] ? read_dirty+0x500/0x500 [bcache] >> [<ffffffff900a52f8>] kthread+0xd8/0xf0 >> [<ffffffff900a5220>] ? kthread_park+0x60/0x60 >> [<ffffffff906e7ac5>] ret_from_fork+0x55/0x80 >> [<ffffffff900a5220>] ? kthread_park+0x60/0x60 >> INFO: task tp_osd_tp:4126 blocked for more than 120 seconds. >> Not tainted 4.4.120+555-ph #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> tp_osd_tp D ffff880f31eebe38 0 4126 1 0x00080000 >> ffff880f31eebe38 000000000047ffff ffff880f31ebcb00 ffff880181548000 >> ffff880f31eec000 ffffffff90e81564 ffff880f31ebcb00 00000000ffffffff >> ffffffff90e81568 ffff880f31eebe50 ffffffff906e30b5 ffffffff90e81560 >> Call Trace: >> [<ffffffff906e30b5>] schedule+0x35/0x80 >> [<ffffffff906e339e>] schedule_preempt_disabled+0xe/0x10 >> [<ffffffff906e4ce5>] __mutex_lock_slowpath+0x95/0x110 >> [<ffffffff906e4d77>] mutex_lock+0x17/0x30 >> [<ffffffff9016bb62>] lru_add_drain_all+0x32/0x180 >> [<ffffffff9019ec6a>] SyS_fadvise64+0x1ca/0x280 >> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6 >> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6 >> Leftover inexact backtrace: [2544108.753700] >> INFO: task tp_osd_tp:4355 blocked for more than 120 seconds. >> Not tainted 4.4.120+555-ph #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> tp_osd_tp D ffff881005f6fd38 0 4355 1 0x00080004 >> ffff881005f6fd38 0000000000000000 ffff880efb4acb00 ffff88018154a580 >> ffff881005f70000 ffff881005f6fe90 ffff881005f6fe88 ffff880efb4acb00 >> 000000000000000c ffff881005f6fd50 ffffffff906e30b5 7fffffffffffffff >> Call Trace: >> [<ffffffff906e30b5>] schedule+0x35/0x80 >> [<ffffffff906e6396>] schedule_timeout+0x1b6/0x270 >> [<ffffffff906e442f>] wait_for_completion+0x9f/0x110 >> [<ffffffff9009fcdf>] flush_work+0xff/0x1a0 >> [<ffffffff9016bc6c>] lru_add_drain_all+0x13c/0x180 >> [<ffffffff9019ec6a>] SyS_fadvise64+0x1ca/0x280 >> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6 >> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6 >> Leftover inexact backtrace: [2544108.755357] >> INFO: task tp_fstore_op:4925 blocked for more than 120 seconds. >> Not tainted 4.4.120+555-ph #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> tp_fstore_op D ffff880e26defdd0 0 4925 1 0x00080004 >> ffff880e26defdd0 00ff88103e354780 ffff880e383fa580 ffff880181548000 >> ffff880e26df0000 ffff88103d40af18 ffff880e26defe8c ffff880e383fa580 >> ffff880fc353b000 ffff880e26defde8 ffffffff906e30b5 00004742000518a8 >> Call Trace: >> [<ffffffff906e30b5>] schedule+0x35/0x80 >> [<ffffffff903558b6>] _xfs_log_force_lsn+0x196/0x320 >> [<ffffffff903385ec>] __xfs_fsync+0xec/0x1c0 >> [<ffffffff903386d8>] xfs_file_fsync+0x18/0x20 >> [<ffffffff902086de>] vfs_fsync_range+0x4e/0xb0 >> [<ffffffff9020879d>] do_fsync+0x3d/0x70 >> [<ffffffff90208a20>] SyS_fsync+0x10/0x20 >> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6 >> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6 >> Leftover inexact backtrace: [2544108.757028] >> INFO: task tp_fstore_op:4927 blocked for more than 120 seconds. >> Not tainted 4.4.120+555-ph #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> tp_fstore_op D ffff880e37867bb8 0 4927 1 0x00080004 >> ffff880e37867bb8 00ff880e37867be0 ffff880e0f0e0000 ffff880181562580 >> ffff880e37868000 ffff88107fd75b80 7fffffffffffffff ffffffff906e3880 >> ffff880e37867d10 ffff880e37867bd0 ffffffff906e30b5 0000000000000000 >> Call Trace: >> [<ffffffff906e30b5>] schedule+0x35/0x80 >> [<ffffffff906e6396>] schedule_timeout+0x1b6/0x270 >> [<ffffffff906e2854>] io_schedule_timeout+0xa4/0x110 >> [<ffffffff906e389b>] bit_wait_io+0x1b/0x70 >> [<ffffffff906e34cd>] __wait_on_bit+0x5d/0x90 >> [<ffffffff9015a23b>] wait_on_page_bit+0x7b/0x80 >> [<ffffffff9015a324>] __filemap_fdatawait_range+0xe4/0x140 >> [<ffffffff9015a394>] filemap_fdatawait_range+0x14/0x30 >> [<ffffffff9015c1af>] filemap_write_and_wait_range+0x3f/0x70 >> [<ffffffff9033854d>] __xfs_fsync+0x4d/0x1c0 >> [<ffffffff903386d8>] xfs_file_fsync+0x18/0x20 >> [<ffffffff902086de>] vfs_fsync_range+0x4e/0xb0 >> [<ffffffff9020879d>] do_fsync+0x3d/0x70 >> [<ffffffff90208a43>] SyS_fdatasync+0x13/0x20 >> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6 >> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6 >> Leftover inexact backtrace: [2544108.758684] >> INFO: task tp_osd_tp:4998 blocked for more than 120 seconds. >> Not tainted 4.4.120+555-ph #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> tp_osd_tp D ffff880dbbb2f7c8 0 4998 1 0x00080004 >> ffff880dbbb2f7c8 00ff880af4b96d80 ffff881007b1cb00 ffff88018151a580 >> ffff880dbbb30000 ffff88102dcf9b30 ffff881007b1cb00 0000000000000001 >> ffff880fc6255000 ffff880dbbb2f7e0 ffffffff906e30b5 7fffffffffffffff >> Call Trace: >> [<ffffffff906e30b5>] schedule+0x35/0x80 >> [<ffffffff906e6396>] schedule_timeout+0x1b6/0x270 >> [<ffffffff906e4ecf>] __down+0x6f/0xb0 >> [<ffffffff900cfce1>] down+0x41/0x50 >> [<ffffffff90332fdc>] xfs_buf_lock+0x3c/0x100 >> [<ffffffff90333215>] _xfs_buf_find+0x175/0x3b0 >> [<ffffffff9033347a>] xfs_buf_get_map+0x2a/0x270 >> [<ffffffff9033420d>] xfs_buf_read_map+0x2d/0x180 >> [<ffffffff903616ea>] xfs_trans_read_buf_map+0xfa/0x340 >> [<ffffffff90313260>] xfs_da_read_buf+0xb0/0xe0 >> [<ffffffff90319d20>] xfs_dir3_leaf_read.constprop.15+0x30/0x60 >> [<ffffffff9031b481>] xfs_dir2_leaf_lookup_int+0x51/0x290 >> [<ffffffff9031b6f0>] xfs_dir2_leaf_lookup+0x30/0x140 >> [<ffffffff90316607>] xfs_dir_lookup+0x177/0x190 >> [<ffffffff9034534a>] xfs_lookup+0x5a/0x110 >> [<ffffffff903420d8>] xfs_vn_lookup+0x58/0xa0 >> [<ffffffff901dfc93>] lookup_real+0x23/0x60 >> [<ffffffff901e16e3>] __lookup_hash+0x33/0x40 >> [<ffffffff901e2aca>] walk_component+0x20a/0x2e0 >> [<ffffffff901e3b87>] path_lookupat+0x67/0x120 >> [<ffffffff901e53ca>] filename_lookup+0x9a/0x110 >> [<ffffffff901e54f6>] user_path_at_empty+0x36/0x40 >> [<ffffffff901db533>] vfs_fstatat+0x53/0xa0 >> [<ffffffff901db9af>] SYSC_newstat+0x1f/0x40 >> [<ffffffff901dbaae>] SyS_newstat+0xe/0x10 >> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6 >> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6 >> Leftover inexact backtrace: [2544108.760372] >> INFO: task xfsaild/bcache0:4209 blocked for more than 120 seconds. >> Not tainted 4.4.120+555-ph #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> xfsaild/bcache0 D ffff880f21713d40 0 4209 2 0x00080000 >> ffff880f21713d40 00ff88101d607000 ffff881034bc0000 ffff880181530000 >> ffff880f21714000 ffff881034bc0000 ffff88101d607000 ffff88101d607128 >> ffff880ff00a3c00 ffff880f21713d58 ffffffff906e30b5 0000000000000000 >> Call Trace: >> [<ffffffff906e30b5>] schedule+0x35/0x80 >> [<ffffffff903554f7>] _xfs_log_force+0x137/0x280 >> [<ffffffff90355666>] xfs_log_force+0x26/0x90 >> [<ffffffff90360445>] xfsaild+0x155/0x6b0 >> [<ffffffff900a52f8>] kthread+0xd8/0xf0 >> [<ffffffff906e7ac5>] ret_from_fork+0x55/0x80 >> DWARF2 unwinder stuck at ret_from_fork+0x55/0x80 >> Leftover inexact backtrace: [2544108.762015] >> [<ffffffff900a5220>] ? kthread_park+0x60/0x60 >> INFO: task tp_fstore_op:4909 blocked for more than 120 seconds. >> Not tainted 4.4.120+555-ph #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> tp_fstore_op D ffff880e10e17888 0 4909 1 0x00080004 >> ffff880e10e17888 00ff880e10e17878 ffff880eda1dcb00 ffffffff90e134c0 >> ffff880e10e18000 ffff88015609ac30 ffff880eda1dcb00 0000000000000001 >> ffff88102a45d300 ffff880e10e178a0 ffffffff906e30b5 7fffffffffffffff >> Call Trace: >> [<ffffffff906e30b5>] schedule+0x35/0x80 >> [<ffffffff906e6396>] schedule_timeout+0x1b6/0x270 >> [<ffffffff906e4ecf>] __down+0x6f/0xb0 >> [<ffffffff900cfce1>] down+0x41/0x50 >> [<ffffffff90332fdc>] xfs_buf_lock+0x3c/0x100 >> [<ffffffff90333215>] _xfs_buf_find+0x175/0x3b0 >> [<ffffffff9033347a>] xfs_buf_get_map+0x2a/0x270 >> [<ffffffff9033420d>] xfs_buf_read_map+0x2d/0x180 >> [<ffffffff903616ea>] xfs_trans_read_buf_map+0xfa/0x340 >> [<ffffffff90313260>] xfs_da_read_buf+0xb0/0xe0 >> [<ffffffff902fa379>] xfs_attr3_leaf_read+0x29/0x60 >> [<ffffffff902f8464>] xfs_attr_leaf_addname+0x44/0x3f0 >> [<ffffffff902f907c>] xfs_attr_set+0x1fc/0x4d0 >> [<ffffffff9035198c>] xfs_xattr_set+0x5c/0xa0 >> [<ffffffff901fa567>] generic_setxattr+0x67/0x80 >> [<ffffffff901fae52>] __vfs_setxattr_noperm+0x52/0xd0 >> [<ffffffff901faf77>] vfs_setxattr+0xa7/0xb0 >> [<ffffffff901fb0a8>] setxattr+0x128/0x200 >> [<ffffffff901fb47d>] SyS_fsetxattr+0x7d/0xa0 >> [<ffffffff906e7685>] entry_SYSCALL_64_fastpath+0x1e/0xb6 >> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6 >> Leftover inexact backtrace: [2544108.763689] > > Hi Stefan, > > Is it 4.4 stable tree, or you also have your own patch ? Yes we do have > some fixes on journal recently, but I am not sure whether your kernel > has these patches. sorry i missed the list of patches. The following patches are applied on top of the vanilla 4.4.120 kernel: Subject: [PATCH 04/11] mbcache: remove mbcache Subject: [PATCH 09/11] mbcache: get rid of _e_hash_list_head Subject: [PATCH 11/11] mbcache: add reusable flag to cache entries Subject: bcache: switch to using blk_queue_write_cache() Subject: bcache: use op_is_write instead of checking for REQ_WRITE Subject: bcache: use bio op accessors Subject: bcache: use op_is_sync to check for synchronous requests Subject: bcache: debug: avoid accessing .bi_io_vec directly Subject: [PATCH] bcache: Remove deprecated create_workqueue Subject: [PATCH] bcache: Remove redundant parameter for cache_alloc() Subject: [PATCH] bcache: update document info Subject: [PATCH] bcache: Remove redundant block_size assignment Subject: [PATCH] bcache: register_bcache(): call blkdev_put() when Subject: [PATCH] bcache: pr_err: more meaningful error message when nr_stripes Subject: [PATCH] bcache: bch_writeback_thread() is not freezable Subject: [PATCH] bcache: bch_allocator_thread() is not freezable Subject: [PATCH] bcache: documentation updates and corrections Subject: [PATCH] bcache: documentation formatting, edited for clarity, stripe Subject: [PATCH] bcache: partition support: add 16 minors per bcacheN device Subject: [PATCH] bcache: use kmalloc to allocate bio in bch_data_verify() Subject: [PATCH] bcache: fix sequential large write IO bypass Subject: [PATCH] bcache: Don't reinvent the wheel but use existing llist API Subject: [PATCH] bcache: gc does not work when triggering by manual command Subject: [PATCH] bcache: increase the number of open buckets Subject: [PATCH] bcache: silence static checker warning Subject: [PATCH] bcache: Update continue_at() documentation Subject: [PATCH] bcache: use llist_for_each_entry_safe() in Subject: [PATCH] bcache: Avoid nested function definition Subject: [PATCH] bcache: fix a comments typo in bch_alloc_sectors() Subject: [PATCH] bcache: rewrite multiple partitions support Subject: [PATCH] bcache: Remove redundant set_capacity Subject: [PATCH] bcache: update bio->bi_opf bypass/writeback REQ_ flag hints Subject: [PATCH] bcache: remove unused parameter Subject: [PATCH] bcache: don't write back data if reading it failed Subject: [PATCH] bcache: implement PI controller for writeback rate Subject: [PATCH] bcache: smooth writeback rate control Subject: [PATCH] bcache: writeback rate shouldn't artifically clamp Subject: [PATCH] bcache: rearrange writeback main thread ratelimit Subject: [PATCH] bcache: safeguard a dangerous addressing in closure_queue Subject: [PATCH] bcache: writeback rate clamping: make 32 bit safe Subject: [PATCH] bcache: update bucket_in_use in real time Subject: [PATCH] bcache: add a comment in journal bucket reading Subject: [PATCH 03/19] bcache: fix wrong return value in bch_debug_init() Subject: [PATCH 04/19] bcache: writeback: properly order backing device IO Subject: [PATCH 06/19] bcache: Fix, improve efficiency of closure_sync() Subject: [PATCH 07/19] bcache: mark closure_sync() __sched Subject: [PATCH 08/19] bcache: reduce cache_set devices iteration by Subject: [PATCH 09/19] bcache: fix writeback target calc on large devices Subject: [PATCH 10/19] bcache: closures: move control bits one bit right Subject: [PATCH 11/19] bcache: add journal statistic Subject: [PATCH 12/19] bcache: fix high CPU occupancy during journal Subject: [PATCH 13/19] bcache: properly set task state in Subject: [PATCH 14/19] bcache: set error_limit correctly Subject: [PATCH 15/19] bcache: fix for allocator and register thread race Subject: [PATCH 16/19] bcache: set writeback_rate_update_seconds in range [1, Subject: [PATCH 17/19] bcache: return attach error when no cache set exist Subject: [PATCH 18/19] bcache: fix for data collapse after re-attaching an Subject: [PATCH 01/19] bcache: stop writeback thread after detaching Subject: [PATCH 02/19] bcache: segregate flash only volume write streams Subject: [PATCH 05/19] bcache: allow quick writeback when backing idle > Thanks. > > Coly Li > -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html