On Wed, May 2, 2018 at 7:19 AM, Sean Sullivan <lookcrabs@xxxxxxxxx> wrote: > Forgot to reply to all: > > Sure thing! > > I couldn't install the ceph-mds-dbg packages without upgrading. I just > finished upgrading the cluster to 12.2.5. The issue still persists in 12.2.5 > > From here I'm not really sure how to do generate the backtrace so I hope I > did it right. For others on Ubuntu this is what I did: > > * firstly up the debug_mds to 20 and debug_ms to 1: > ceph tell mds.* injectargs '--debug-mds 20 --debug-ms 1' > > * install the debug packages > ceph-mds-dbg in my case > > * I also added these options to /etc/ceph/ceph.conf just in case they > restart. > > * Now allow pids to dump (stolen partly from redhat docs and partly from > ubuntu) > echo -e 'DefaultLimitCORE=infinity\nPrivateTmp=true' | tee -a > /etc/systemd/system.conf > sysctl fs.suid_dumpable=2 > sysctl kernel.core_pattern=/tmp/core > systemctl daemon-reload > systemctl restart ceph-mds@$(hostname -s) > > * A crash was created in /var/crash by apport but gdb cant read it. I used > apport-unpack and then ran GDB on what is inside: > core dump should be in /tmp/core > apport-unpack /var/crash/$(ls /var/crash/*mds*) /root/crash_dump/ > cd /root/crash_dump/ > gdb $(cat ExecutablePath) CoreDump -ex 'thr a a bt' | tee > /root/ceph_mds_$(hostname -s)_backtrace > > * This left me with the attached backtraces (which I think are wrong as I > see a lot of ?? yet gdb says > /usr/lib/debug/.build-id/1d/23dc5ef4fec1dacebba2c6445f05c8fe6b8a7c.debug was > loaded) > > kh10-8 mds backtrace -- https://pastebin.com/bwqZGcfD > kh09-8 mds backtrace -- https://pastebin.com/vvGiXYVY > Try running ceph-mds inside gdb. It should be easy to locate the bug once we have correct coredump file. Regards Yan, Zheng > > The log files are pretty large (one 4.1G and the other 200MB) > > kh10-8 (200MB) mds log -- > https://griffin-objstore.opensciencedatacloud.org/logs/ceph-mds.kh10-8.log > kh09-8 (4.1GB) mds log -- > https://griffin-objstore.opensciencedatacloud.org/logs/ceph-mds.kh09-8.log > > On Tue, May 1, 2018 at 12:09 AM, Patrick Donnelly <pdonnell@xxxxxxxxxx> > wrote: >> >> Hello Sean, >> >> On Mon, Apr 30, 2018 at 2:32 PM, Sean Sullivan <lookcrabs@xxxxxxxxx> >> wrote: >> > I was creating a new user and mount point. On another hardware node I >> > mounted CephFS as admin to mount as root. I created /aufstest and then >> > unmounted. From there it seems that both of my mds nodes crashed for >> > some >> > reason and I can't start them any more. >> > >> > https://pastebin.com/1ZgkL9fa -- my mds log >> > >> > I have never had this happen in my tests so now I have live data here. >> > If >> > anyone can lend a hand or point me in the right direction while >> > troubleshooting that would be a godsend! >> >> Thanks for keeping the list apprised of your efforts. Since this is so >> easily reproduced for you, I would suggest that you next get higher >> debug logs (debug_mds=20/debug_ms=1) from the MDS. And, since this is >> a segmentation fault, a backtrace with debug symbols from gdb would >> also be helpful. >> >> -- >> Patrick Donnelly > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com