Hi, I’m very much hoping someone can unblock me on this – we recently ran into a very odd issue – I sent an earlier email to the list
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033579.html After unsuccessfully trying to repair we decided to forsake the Filesystem I marked the cluster down, failed the MDSs, removed the FS and the metadata and data pools. Then created a new Filesystem from scratch. However, I am still observing MDS segfaulting when a client tries to connect. This is quite urgent for me as we don’t have a functioning Filesystem – if someone can advise how I can remove any and all state
please do so – I just want to start fresh. I am very puzzled that a brand new FS doesn’t work Here is the MDS log at level 20 – one odd thing I notice is that the client seems to start showing ? as the id well before the segfault…In any case, I’m just asking what needs to be done to remove all state
from the MDS nodes: 2019-03-08 19:30:12.024535 7f25ec184700 20 mds.0.server get_session have 0x5477e00 client.2160819875 <client_ip>:0/945029522 state open 2019-03-08 19:30:12.024537 7f25ec184700 15 mds.0.server
oldest_client_tid=1 2019-03-08 19:30:12.024564 7f25ec184700
7 mds.0.cache request_start request(client.?:1 cr=0x54a8680) 2019-03-08 19:30:12.024566 7f25ec184700
7 mds.0.server dispatch_client_request client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510 RETRY=2) v2 2019-03-08 19:30:12.024576 7f25ec184700 10 mds.0.server rdlock_path_pin_ref request(client.?:1 cr=0x54a8680) #1 2019-03-08 19:30:12.024577 7f25ec184700
7 mds.0.cache traverse: opening base ino 1 snap head 2019-03-08 19:30:12.024579 7f25ec184700 10 mds.0.cache path_traverse finish on snapid head 2019-03-08 19:30:12.024580 7f25ec184700 10 mds.0.server ref is [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | dirfrag=1 0x53ca968] 2019-03-08 19:30:12.024589 7f25ec184700 10 mds.0.locker acquire_locks request(client.?:1 cr=0x54a8680) 2019-03-08 19:30:12.024591 7f25ec184700 20 mds.0.locker
must rdlock (iauth sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968] 2019-03-08 19:30:12.024594 7f25ec184700 20 mds.0.locker
must rdlock (ilink sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968] 2019-03-08 19:30:12.024597 7f25ec184700 20 mds.0.locker
must rdlock (ifile sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968] 2019-03-08 19:30:12.024600 7f25ec184700 20 mds.0.locker
must rdlock (ixattr sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968] 2019-03-08 19:30:12.024602 7f25ec184700 20 mds.0.locker
must rdlock (isnap sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968] 2019-03-08 19:30:12.024605 7f25ec184700 10 mds.0.locker
must authpin [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968] 2019-03-08 19:30:12.024607 7f25ec184700 10 mds.0.locker
auth_pinning [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968] 2019-03-08 19:30:12.024610 7f25ec184700 10 mds.0.cache.ino(1) auth_pin by 0x51e5e00 on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968] now 1+0 2019-03-08 19:30:12.024614 7f25ec184700
7 mds.0.locker rdlock_start
on (isnap sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968] 2019-03-08 19:30:12.024618 7f25ec184700 10 mds.0.locker
got rdlock on (isnap sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1 dirfrag=1 authpin=1 0x53ca968] 2019-03-08 19:30:12.024621 7f25ec184700
7 mds.0.locker rdlock_start
on (ifile sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1 dirfrag=1 authpin=1 0x53ca968] 2019-03-08 19:30:12.024625 7f25ec184700 10 mds.0.locker
got rdlock on (ifile sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=2 dirfrag=1 authpin=1 0x53ca968] 2019-03-08 19:30:12.024628 7f25ec184700
7 mds.0.locker rdlock_start
on (iauth sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=2 dirfrag=1 authpin=1 0x53ca968] 2019-03-08 19:30:12.024631 7f25ec184700 10 mds.0.locker
got rdlock on (iauth sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=3 dirfrag=1 authpin=1 0x53ca968] 2019-03-08 19:30:12.024635 7f25ec184700
7 mds.0.locker rdlock_start
on (ilink sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=3 dirfrag=1 authpin=1 0x53ca968] 2019-03-08 19:30:12.024638 7f25ec184700 10 mds.0.locker
got rdlock on (ilink sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=4 dirfrag=1 authpin=1 0x53ca968] 2019-03-08 19:30:12.024642 7f25ec184700
7 mds.0.locker rdlock_start
on (ixattr sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=4 dirfrag=1 authpin=1 0x53ca968] 2019-03-08 19:30:12.024646 7f25ec184700 10 mds.0.locker
got rdlock on (ixattr sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile sync r=1) (ixattr sync r=1) (iversion lock) | request=1 lock=5 dirfrag=1
authpin=1 0x53ca968] 2019-03-08 19:30:12.024658 7f25ec184700 10 mds.0.server reply to stat on client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510 RETRY=2) v2 2019-03-08 19:30:12.024661 7f25ec184700 10 mds.0.server reply_client_request 0 ((0) Success) client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510 RETRY=2) v2 2019-03-08 19:30:12.024673 7f25ec184700 10 mds.0.server apply_allocated_inos 0 / [] / 0 2019-03-08 19:30:12.024674 7f25ec184700 20 mds.0.server lat 0.060895 2019-03-08 19:30:12.024677 7f25ec184700 20 mds.0.server set_trace_dist snapid head 2019-03-08 19:30:12.024679 7f25ec184700 10 mds.0.server set_trace_dist snaprealm snaprealm(1 seq 1 lc 0 cr 0 cps 1 snaps={} 0x53b8480) len=48 2019-03-08 19:30:12.024683 7f25ec184700 20 mds.0.cache.ino(1)
pfile 0 pauth 0 plink 0 pxattr 0 plocal 0 ctime 2019-03-07 21:12:21.476328 valid=1 2019-03-08 19:30:12.024688 7f25ec184700 10 mds.0.cache.ino(1) add_client_cap first cap, joining realm snaprealm(1 seq 1 lc 0 cr 0 cps 1 snaps={} 0x53b8480) 2019-03-08 19:30:12.026741 7f25ec184700 -1 *** Caught signal (Segmentation fault) ** in thread 7f25ec184700 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 1: ceph_mds() [0x89982a] 2: (()+0x10350) [0x7f25f4647350] 3: (CInode::get_caps_allowed_for_client(client_t) const+0x130) [0x7a19f0] 4: (CInode::encode_inodestat(ceph::buffer::list&, Session*, SnapRealm*, snapid_t, unsigned int, int)+0x132d) [0x7b383d] 5: (Server::set_trace_dist(Session*, MClientReply*, CInode*, CDentry*, snapid_t, int, std::tr1::shared_ptr<MDRequestImpl>&)+0x471) [0x5f26e1] 6: (Server::reply_client_request(std::tr1::shared_ptr<MDRequestImpl>&, MClientReply*)+0x846) [0x611056] 7: (Server::respond_to_request(std::tr1::shared_ptr<MDRequestImpl>&, int)+0x4d9) [0x611759] 8: (Server::handle_client_getattr(std::tr1::shared_ptr<MDRequestImpl>&, bool)+0x47b) [0x613eab] 9: (Server::dispatch_client_request(std::tr1::shared_ptr<MDRequestImpl>&)+0xa38) [0x633da8] 10: (Server::handle_client_request(MClientRequest*)+0x3df) [0x63435f] 11: (Server::dispatch(Message*)+0x3f3) [0x63b8b3] 12: (MDS::handle_deferrable_message(Message*)+0x847) [0x5b6c27] 13: (MDS::_dispatch(Message*)+0x6d) [0x5d2bed] 14: (C_MDS_RetryMessage::finish(int)+0x1b) [0x63d24b] 15: (MDSInternalContextBase::complete(int)+0x163) [0x7e3363] 16: (MDS::_advance_queues()+0x48d) [0x5c9e4d] 17: (MDS::ProgressThread::entry()+0x4a) [0x5ca1aa] 18: (()+0x8192) [0x7f25f463f192] 19: (clone()+0x6d) [0x7f25f3b4c26d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com