Re: XFS filesystem hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 17.01.19 um 13:34 schrieb Brian Foster:
> On Thu, Jan 17, 2019 at 12:14:11PM +0100, Daniel Aberger - Profihost AG wrote:
>> Hi,
>>
>> one of our servers crashed / hung several times in the past days with
>> similar errors regarding XFS. Unfortunately we are unable to interprete
>> the call trace of dmesg to pinpoint the exact problem and I hope you
>> could help me to do so.
>>
>> We already ran xfs_repair to ensure that there are no filesystem errors.
>>
>> Here is an example dmesg output of recent crash of the mentioned server:
>>
>> 2019-01-12 06:06:35     INFO: task mysqld:1171 blocked for more than 120
>> seconds.
> 
> This is not a crash but rather an indication from the kernel that this
> task has been sitting here for at least a couple minutes. This doesn't
> tell us whether the task will eventually recover or remain blocked
> indefinitely.
> 

Yes, you're right. Well I was referring to it as a crash since the
system does not respond to any inputs anymore and we have to perform a
REISUB to reboot it.

>> 2019-01-12 06:06:35     Tainted: G 4.12.0+139-ph #1
>> 2019-01-12 06:06:35     "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> 2019-01-12 06:06:35     mysqld D 0 1171 1 0x00080000
>> 2019-01-12 06:06:35     Call Trace:
>> 2019-01-12 06:06:35     ? __schedule+0x3bc/0x820
>> 2019-01-12 06:06:35     ? strlcpy+0x31/0x40
>> 2019-01-12 06:06:35     ?
>> kernfs_path_from_node_locked+0x238/0x320schedule+0x32/0x80xlog_grant_head_wait+0xca/0x200xlog_grant_head_check+0x86/0xe0xfs_log_reserve+0xc7/0x1c0xfs_trans_reserve+0x169/0x1c0xfs_trans_alloc+0xb9/0x130xfs_vn_update_time+0x4e/0x130file_update_time+0xa7/0x100xfs_file_aio_write_checks+0x178/0x1a0
>> 2019-01-12 06:06:35     ?
> 
> I'm not sure how reliable the stack is, but fwiw it looks like this is
> an aio that is waiting on transaction reservation to update a file time.
> Transaction reservation is a normal blocking path because log space is a
> finite resource. We certainly don't expect to block here for minutes,
> but the fact that we don't know anything about your filesystem geometry,
> mount options, underlying storage, workload, etc. will make it difficult
> for anybody to surmise what the problem could be.
> 
> Please see the following url for what information to include when
> reporting a problem:
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> 
> I'd also suggest to include broader dmesg output and/or xfs_repair
> output if there are multiple error reports that aren't exactly the same.
> 

I'm sorry, I missed this page on my search for an FAQ somehow.


* Kernel Version: Linux 4.12.0+139-ph #1 SMP Tue Jan 1 21:46:16 UTC 2019
x86_64 GNU/Linux

* xfsprogs version: 4.9.0

* Single CPU, Xeon E5-1650 v3, 6 cores, 12 threads

* /proc/meminfo, /proc/mounts, /proc/partitions and xfs_info can be
found here: https://pastebin.com/cZiTrUDL

* full dmesg output of problem mentioned in the first mail:
https://pastebin.com/pLaz18L1

* a couple of more dmesg outputs from the same system with similar
behaviour:
 * https://pastebin.com/hWDbwcCr
 * https://pastebin.com/HAqs4yQc

* Raid Controller: Adaptec ASR8805
 * Logical Device 0 is a RAID6 with 5 SSDs (Samsung MZ7KM480), /dev/sda
 * Logical Device 1 is a single backup HDD (Seagate ST2000NX0253) for
internal backups, /dev/sdb

* Output of xfs_repair: https://pastebin.com/JzVMqrLh

The problem seems to happen randomly. There does not have to be high
load on our storage devices. This makes it pretty hard to find an exact
cause.


- Daniel

> Brian
> 
>> ktime_get+0x3e/0xa0xfs_file_dio_aio_write+0xb1/0x230xfs_file_write_iter+0xff/0x150aio_write+0xf6/0x150
>> 2019-01-12 06:06:35     ? queue_unplugged+0x25/0xa0
>> 2019-01-12 06:06:35     ? kmem_cache_alloc+0xf7/0x570
>> 2019-01-12 06:06:35     ? do_io_submit+0x35c/0x690do_io_submit+0x35c/0x690
>> 2019-01-12 06:06:35     ?
>> do_syscall_64+0x74/0x150do_syscall_64+0x74/0x150entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>> 2019-01-12 06:06:35     RIP: 0033:0x7f37aa6b6717
>> 2019-01-12 06:06:35     RSP: 002b:00007f32953f4228 EFLAGS: 00000206
>> ORIG_RAX: 00000000000000d1
>> 2019-01-12 06:06:35     RAX: ffffffffffffffda RBX: 00000000000040f6 RCX:
>> 00007f37aa6b6717
>> 2019-01-12 06:06:35     RDX: 00007f32953f4230 RSI: 0000000000000001 RDI:
>> 00007f37aaef9000
>> 2019-01-12 06:06:35     RBP: 00007f32953f4250 R08: 00007f32953f4230 R09:
>> 0000000000000000
>> 2019-01-12 06:06:35     R10: 0000000000010000 R11: 0000000000000206 R12:
>> 00007f32953f4370
>> 2019-01-12 06:06:35     R13: 00007f37a806d158 R14: 00007f32a5829b98 R15:
>> 00007f379df26100
>>
>> Thanks,
>>
>> Daniel



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux