Re: 12.2.4 Both Ceph MDS nodes crashed. Please help.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 24, 2018 at 12:00 AM, Sean Sullivan <lookcrabs@xxxxxxxxx> wrote:
> Thanks Yan! I did this for the bug ticket and missed these replies. I hope I
> did it correctly. Here are the pastes of the dumps:
>
> https://pastebin.com/kw4bZVZT -- primary
> https://pastebin.com/sYZQx0ER -- secondary
>
>
> they are not that long here is the output of one:
>
> Thread 17 "mds_rank_progr" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fe3b100a700 (LWP 120481)]
> 0x00005617aacc48c2 in Server::handle_client_getattr
> (this=this@entry=0x5617b5acbcd0, mdr=..., is_lookup=is_lookup@entry=true) at
> /build/ceph-12.2.5/src/mds/Server.cc:3065
> 3065    /build/ceph-12.2.5/src/mds/Server.cc: No such file or directory.
> (gdb) t
> [Current thread is 17 (Thread 0x7fe3b100a700 (LWP 120481))]
> (gdb) bt
> #0  0x00005617aacc48c2 in Server::handle_client_getattr
> (this=this@entry=0x5617b5acbcd0, mdr=..., is_lookup=is_lookup@entry=true) at
> /build/ceph-12.2.5/src/mds/Server.cc:3065
> #1  0x00005617aacfc98b in Server::dispatch_client_request
> (this=this@entry=0x5617b5acbcd0, mdr=...) at
> /build/ceph-12.2.5/src/mds/Server.cc:1802
> #2  0x00005617aacfce9b in Server::handle_client_request
> (this=this@entry=0x5617b5acbcd0, req=req@entry=0x5617bdfa8700)at
> /build/ceph-12.2.5/src/mds/Server.cc:1716
> #3  0x00005617aad017b6 in Server::dispatch (this=0x5617b5acbcd0,
> m=m@entry=0x5617bdfa8700) at /build/ceph-12.2.5/src/mds/Server.cc:258
> #4  0x00005617aac6afac in MDSRank::handle_deferrable_message
> (this=this@entry=0x5617b5d22000, m=m@entry=0x5617bdfa8700)at
> /build/ceph-12.2.5/src/mds/MDSRank.cc:716
> #5  0x00005617aac795cb in MDSRank::_dispatch
> (this=this@entry=0x5617b5d22000, m=0x5617bdfa8700,
> new_msg=new_msg@entry=false) at /build/ceph-12.2.5/src/mds/MDSRank.cc:551
> #6  0x00005617aac7a472 in MDSRank::retry_dispatch (this=0x5617b5d22000,
> m=<optimized out>) at /build/ceph-12.2.5/src/mds/MDSRank.cc:998
> #7  0x00005617aaf0207b in Context::complete (r=0, this=0x5617bd568080) at
> /build/ceph-12.2.5/src/include/Context.h:70
> #8  MDSInternalContextBase::complete (this=0x5617bd568080, r=0) at
> /build/ceph-12.2.5/src/mds/MDSContext.cc:30
> #9  0x00005617aac78bf7 in MDSRank::_advance_queues (this=0x5617b5d22000) at
> /build/ceph-12.2.5/src/mds/MDSRank.cc:776
> #10 0x00005617aac7921a in MDSRank::ProgressThread::entry
> (this=0x5617b5d22d40) at /build/ceph-12.2.5/src/mds/MDSRank.cc:502
> #11 0x00007fe3bb3066ba in start_thread (arg=0x7fe3b100a700) at
> pthread_create.c:333
> #12 0x00007fe3ba37241d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
>
>
> I
> * set the debug level to mds=20 mon=1,
> *  attached gdb prior to trying to mount aufs from a separate client,
> *  typed continue, attempted the mount,
> * then backtraced after it seg faulted.
>
> I hope this is more helpful. Is there something else I should try to get
> more info? I was hoping for something closer to a python trace where it says
> a variable is a different type or a missing delimiter. womp. I am definitely
> out of my depth but now is a great time to learn! Can anyone shed some more
> light as to what may be wrong?
>

I updated https://tracker.ceph.com/issues/23972.  It's a kernel bug,
which sends malformed request to mds.

Regards
Yan, Zheng
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux