On Sat, Oct 07, 2017 at 01:10:58PM +0500, Михаил Гаврилов wrote: > But seems now got another issue: > > [ 1966.953781] INFO: task tracker-store:8578 blocked for more than 120 seconds. > [ 1966.953797] Not tainted 4.13.4-301.fc27.x86_64+debug #1 > [ 1966.953800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 1966.953804] tracker-store D12840 8578 1655 0x00000000 > [ 1966.953811] Call Trace: > [ 1966.953823] __schedule+0x2dc/0xbb0 > [ 1966.953830] ? wait_on_page_bit_common+0xfb/0x1a0 > [ 1966.953838] schedule+0x3d/0x90 > [ 1966.953843] io_schedule+0x16/0x40 > [ 1966.953847] wait_on_page_bit_common+0x10a/0x1a0 > [ 1966.953857] ? page_cache_tree_insert+0x170/0x170 > [ 1966.953865] __filemap_fdatawait_range+0x101/0x1a0 > [ 1966.953883] file_write_and_wait_range+0x63/0xc0 Ok, that's in wait_on_page_writeback(page) ...... > And yet another > > [41288.797026] INFO: task tracker-store:4535 blocked for more than 120 seconds. > [41288.797034] Not tainted 4.13.4-301.fc27.x86_64+debug #1 > [41288.797037] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [41288.797041] tracker-store D10616 4535 1655 0x00000000 > [41288.797049] Call Trace: > [41288.797061] __schedule+0x2dc/0xbb0 > [41288.797072] ? bit_wait+0x60/0x60 > [41288.797076] schedule+0x3d/0x90 > [41288.797082] io_schedule+0x16/0x40 > [41288.797086] bit_wait_io+0x11/0x60 > [41288.797091] __wait_on_bit+0x31/0x90 > [41288.797099] out_of_line_wait_on_bit+0x94/0xb0 > [41288.797106] ? bit_waitqueue+0x40/0x40 > [41288.797113] __block_write_begin_int+0x265/0x550 > [41288.797132] iomap_write_begin.constprop.14+0x7d/0x130 And that's in wait_on_buffer(). In both cases we are waiting on a bit lock for IO completion. In the first case it is on page, the second it's on sub-page read IO completion during a write. Triggeringa hung task timeouts like this doesn't usually indicate a filesystem problem. In general, it means that the IO subsystem is overloaded such that iowait times are blowing out to >120s, an IO has been lost and/or not completed correctly or you are getting single bit memory errors or corruption occurring. None of these things usually filesystem problems, and the trainsmash of blocked tasks on filesystem locks is typical for these types of "blocked indefinitely with locks held" type of situations. It does tend to indicate taht there is quite a bit of load on the filesystem, though... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html