Re: XFS filesystem hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 17, 2019 at 12:14:11PM +0100, Daniel Aberger - Profihost AG wrote:
> Hi,
> 
> one of our servers crashed / hung several times in the past days with
> similar errors regarding XFS. Unfortunately we are unable to interprete
> the call trace of dmesg to pinpoint the exact problem and I hope you
> could help me to do so.
> 
> We already ran xfs_repair to ensure that there are no filesystem errors.
> 
> Here is an example dmesg output of recent crash of the mentioned server:
> 
> 2019-01-12 06:06:35     INFO: task mysqld:1171 blocked for more than 120
> seconds.

This is not a crash but rather an indication from the kernel that this
task has been sitting here for at least a couple minutes. This doesn't
tell us whether the task will eventually recover or remain blocked
indefinitely.

> 2019-01-12 06:06:35     Tainted: G 4.12.0+139-ph #1
> 2019-01-12 06:06:35     "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2019-01-12 06:06:35     mysqld D 0 1171 1 0x00080000
> 2019-01-12 06:06:35     Call Trace:
> 2019-01-12 06:06:35     ? __schedule+0x3bc/0x820
> 2019-01-12 06:06:35     ? strlcpy+0x31/0x40
> 2019-01-12 06:06:35     ?
> kernfs_path_from_node_locked+0x238/0x320schedule+0x32/0x80xlog_grant_head_wait+0xca/0x200xlog_grant_head_check+0x86/0xe0xfs_log_reserve+0xc7/0x1c0xfs_trans_reserve+0x169/0x1c0xfs_trans_alloc+0xb9/0x130xfs_vn_update_time+0x4e/0x130file_update_time+0xa7/0x100xfs_file_aio_write_checks+0x178/0x1a0
> 2019-01-12 06:06:35     ?

I'm not sure how reliable the stack is, but fwiw it looks like this is
an aio that is waiting on transaction reservation to update a file time.
Transaction reservation is a normal blocking path because log space is a
finite resource. We certainly don't expect to block here for minutes,
but the fact that we don't know anything about your filesystem geometry,
mount options, underlying storage, workload, etc. will make it difficult
for anybody to surmise what the problem could be.

Please see the following url for what information to include when
reporting a problem:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

I'd also suggest to include broader dmesg output and/or xfs_repair
output if there are multiple error reports that aren't exactly the same.

Brian

> ktime_get+0x3e/0xa0xfs_file_dio_aio_write+0xb1/0x230xfs_file_write_iter+0xff/0x150aio_write+0xf6/0x150
> 2019-01-12 06:06:35     ? queue_unplugged+0x25/0xa0
> 2019-01-12 06:06:35     ? kmem_cache_alloc+0xf7/0x570
> 2019-01-12 06:06:35     ? do_io_submit+0x35c/0x690do_io_submit+0x35c/0x690
> 2019-01-12 06:06:35     ?
> do_syscall_64+0x74/0x150do_syscall_64+0x74/0x150entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 2019-01-12 06:06:35     RIP: 0033:0x7f37aa6b6717
> 2019-01-12 06:06:35     RSP: 002b:00007f32953f4228 EFLAGS: 00000206
> ORIG_RAX: 00000000000000d1
> 2019-01-12 06:06:35     RAX: ffffffffffffffda RBX: 00000000000040f6 RCX:
> 00007f37aa6b6717
> 2019-01-12 06:06:35     RDX: 00007f32953f4230 RSI: 0000000000000001 RDI:
> 00007f37aaef9000
> 2019-01-12 06:06:35     RBP: 00007f32953f4250 R08: 00007f32953f4230 R09:
> 0000000000000000
> 2019-01-12 06:06:35     R10: 0000000000010000 R11: 0000000000000206 R12:
> 00007f32953f4370
> 2019-01-12 06:06:35     R13: 00007f37a806d158 R14: 00007f32a5829b98 R15:
> 00007f379df26100
> 
> Thanks,
> 
> Daniel



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux