On Mon, Oct 09, 2017 at 08:31:29PM +0200, Luis R. Rodriguez wrote: > On Mon, Oct 09, 2017 at 11:05:29AM +1100, Dave Chinner wrote: > > On Sat, Oct 07, 2017 at 01:10:58PM +0500, Михаил Гаврилов wrote: > > > But seems now got another issue: > > > > > > [ 1966.953781] INFO: task tracker-store:8578 blocked for more than 120 seconds. > > > [ 1966.953797] Not tainted 4.13.4-301.fc27.x86_64+debug #1 > > > [ 1966.953800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > > disables this message. > > > [ 1966.953804] tracker-store D12840 8578 1655 0x00000000 > > > [ 1966.953811] Call Trace: > > > [ 1966.953823] __schedule+0x2dc/0xbb0 > > > [ 1966.953830] ? wait_on_page_bit_common+0xfb/0x1a0 > > > [ 1966.953838] schedule+0x3d/0x90 > > > [ 1966.953843] io_schedule+0x16/0x40 > > > [ 1966.953847] wait_on_page_bit_common+0x10a/0x1a0 > > > [ 1966.953857] ? page_cache_tree_insert+0x170/0x170 > > > [ 1966.953865] __filemap_fdatawait_range+0x101/0x1a0 > > > [ 1966.953883] file_write_and_wait_range+0x63/0xc0 > > > > Ok, that's in wait_on_page_writeback(page) > > ...... > > > > > And yet another > > > > > > [41288.797026] INFO: task tracker-store:4535 blocked for more than 120 seconds. > > > [41288.797034] Not tainted 4.13.4-301.fc27.x86_64+debug #1 > > > [41288.797037] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > > disables this message. > > > [41288.797041] tracker-store D10616 4535 1655 0x00000000 > > > [41288.797049] Call Trace: > > > [41288.797061] __schedule+0x2dc/0xbb0 > > > [41288.797072] ? bit_wait+0x60/0x60 > > > [41288.797076] schedule+0x3d/0x90 > > > [41288.797082] io_schedule+0x16/0x40 > > > [41288.797086] bit_wait_io+0x11/0x60 > > > [41288.797091] __wait_on_bit+0x31/0x90 > > > [41288.797099] out_of_line_wait_on_bit+0x94/0xb0 > > > [41288.797106] ? bit_waitqueue+0x40/0x40 > > > [41288.797113] __block_write_begin_int+0x265/0x550 > > > [41288.797132] iomap_write_begin.constprop.14+0x7d/0x130 > > > > And that's in wait_on_buffer(). > > > > In both cases we are waiting on a bit lock for IO completion. In the > > first case it is on page, the second it's on sub-page read IO > > completion during a write. > > > > Triggeringa hung task timeouts like this doesn't usually indicate a > > filesystem problem. > > <-- snip --> > > > None of these things usually filesystem problems, and the trainsmash > > of blocked tasks on filesystem locks is typical for these types of > > "blocked indefinitely with locks held" type of situations. It does > > tend to indicate taht there is quite a bit of load on the > > filesystem, though... > > As Jan Kara noted we've seen this also on customers SLE12-SP2 kernel (4.4 > based). Although we also were never able to root cause, since that bug > is now closed on our end I figured it would be worth mentioning two > theories we discussed, one more recent than the other. Sure, but stuff going on with docker mounts and namespaces has nothing to do with IO path locking and completions. There's something in the filesystem or the storage stack below going wrong here, not above it in the vfsmount layer.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>