I have created an anonymized crash log at https://pastebin.ubuntu.com/p/YsVXQQTBCM/ in the hopes that it can help someone understand what's leading to our MDS outage.
Thanks in advance for any assistance. From: Pickett, Neale T
Sent: Thursday, October 10, 2019 21:46 To: ceph-users@xxxxxxxxxxxxxx Subject: mds servers in endless segfault loop Hello, ceph-users.
Our mds servers keep segfaulting from a failed assertion, and for the first time I can't find anyone else who's posted about this problem. None of them are able to stay up, so our cephfs is down.
We recently had to truncate the journal log after an upgrade to nautilus, and now we have lots of dup inodes, failed to open inode, and badness: got (but i already had) messages in the recent event dump, if that's relevant. I don't know which parts of that are going to be the most relevant, but here are the last ten:
-10> 2019-10-11 03:30:35.258 7fd080a69700 0 mds.0.cache failed to open ino 0x10000a1843c err -22/0
-9> 2019-10-11 03:30:35.260 7fd080a69700 0 mds.0.cache failed to open ino 0x10000a1843c err -22/0
-8> 2019-10-11 03:30:35.260 7fd080a69700 0 mds.0.cache failed to open ino 0x10000a1843d err -22/-22
-7> 2019-10-11 03:30:35.260 7fd080a69700 0 mds.0.cache failed to open ino 0x10000a1843e err -22/-22
-6> 2019-10-11 03:30:35.261 7fd080a69700 0 mds.0.cache failed to open ino 0x10000a1843f err -22/-22
-5> 2019-10-11 03:30:35.261 7fd080a69700 0 mds.0.cache failed to open ino 0x10000a1845a err -22/-22
-4> 2019-10-11 03:30:35.262 7fd080a69700 0 mds.0.cache failed to open ino 0x10000a1845e err -22/-22
-3> 2019-10-11 03:30:35.262 7fd080a69700 0 mds.0.cache failed to open ino 0x10000a1846f err -22/-22
-2> 2019-10-11 03:30:35.263 7fd080a69700 0 mds.0.cache failed to open ino 0x10000a18470 err -22/-22
-1> 2019-10-11 03:30:35.273 7fd080a69700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUILD/ceph-14.2.4/src/mds/CInode.cc: In function
'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7fd080a69700 time 2019-10-11 03:30:35.273849
I'm happy to provide any other information that would help diagnose the issue. I don't have any guesses about what else would be helpful, though.
Thanks in advance for any help!
Neale Pickett <neale@xxxxxxxx>
A-4: Advanced Research in Cyber Systems
Los Alamos National Laboratory
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com