MDS segfaults on client connection -- brand new FS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

I’m very much hoping someone can unblock me on this – we recently ran into a very odd issue – I sent an earlier email to the list

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033579.html

 

After unsuccessfully trying to repair we decided to forsake the Filesystem

 

I marked the cluster down, failed the MDSs, removed the FS and the metadata and data pools.

 

Then created a new Filesystem from scratch.

 

However, I am still observing MDS segfaulting when a client tries to connect. This is quite urgent for me as we don’t have a functioning Filesystem – if someone can advise how I can remove any and all state please do so – I just want to start fresh. I am very puzzled that a brand new FS doesn’t work

 

Here is the MDS log at level 20 – one odd thing I notice is that the client seems to start showing ? as the id well before the segfault…In any case, I’m just asking what needs to be done to remove all state from the MDS nodes:

 

2019-03-08 19:30:12.024535 7f25ec184700 20 mds.0.server get_session have 0x5477e00 client.2160819875 <client_ip>:0/945029522 state open

2019-03-08 19:30:12.024537 7f25ec184700 15 mds.0.server  oldest_client_tid=1

2019-03-08 19:30:12.024564 7f25ec184700  7 mds.0.cache request_start request(client.?:1 cr=0x54a8680)

2019-03-08 19:30:12.024566 7f25ec184700  7 mds.0.server dispatch_client_request client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510 RETRY=2) v2

2019-03-08 19:30:12.024576 7f25ec184700 10 mds.0.server rdlock_path_pin_ref request(client.?:1 cr=0x54a8680) #1

2019-03-08 19:30:12.024577 7f25ec184700  7 mds.0.cache traverse: opening base ino 1 snap head

2019-03-08 19:30:12.024579 7f25ec184700 10 mds.0.cache path_traverse finish on snapid head

2019-03-08 19:30:12.024580 7f25ec184700 10 mds.0.server ref is [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024589 7f25ec184700 10 mds.0.locker acquire_locks request(client.?:1 cr=0x54a8680)

2019-03-08 19:30:12.024591 7f25ec184700 20 mds.0.locker  must rdlock (iauth sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024594 7f25ec184700 20 mds.0.locker  must rdlock (ilink sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024597 7f25ec184700 20 mds.0.locker  must rdlock (ifile sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024600 7f25ec184700 20 mds.0.locker  must rdlock (ixattr sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024602 7f25ec184700 20 mds.0.locker  must rdlock (isnap sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024605 7f25ec184700 10 mds.0.locker  must authpin [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024607 7f25ec184700 10 mds.0.locker  auth_pinning [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024610 7f25ec184700 10 mds.0.cache.ino(1) auth_pin by 0x51e5e00 on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968] now 1+0

2019-03-08 19:30:12.024614 7f25ec184700  7 mds.0.locker rdlock_start  on (isnap sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024618 7f25ec184700 10 mds.0.locker  got rdlock on (isnap sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024621 7f25ec184700  7 mds.0.locker rdlock_start  on (ifile sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024625 7f25ec184700 10 mds.0.locker  got rdlock on (ifile sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=2 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024628 7f25ec184700  7 mds.0.locker rdlock_start  on (iauth sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=2 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024631 7f25ec184700 10 mds.0.locker  got rdlock on (iauth sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=3 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024635 7f25ec184700  7 mds.0.locker rdlock_start  on (ilink sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=3 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024638 7f25ec184700 10 mds.0.locker  got rdlock on (ilink sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=4 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024642 7f25ec184700  7 mds.0.locker rdlock_start  on (ixattr sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=4 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024646 7f25ec184700 10 mds.0.locker  got rdlock on (ixattr sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile sync r=1) (ixattr sync r=1) (iversion lock) | request=1 lock=5 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024658 7f25ec184700 10 mds.0.server reply to stat on client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510 RETRY=2) v2

2019-03-08 19:30:12.024661 7f25ec184700 10 mds.0.server reply_client_request 0 ((0) Success) client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510 RETRY=2) v2

2019-03-08 19:30:12.024673 7f25ec184700 10 mds.0.server apply_allocated_inos 0 / [] / 0

2019-03-08 19:30:12.024674 7f25ec184700 20 mds.0.server lat 0.060895

2019-03-08 19:30:12.024677 7f25ec184700 20 mds.0.server set_trace_dist snapid head

2019-03-08 19:30:12.024679 7f25ec184700 10 mds.0.server set_trace_dist snaprealm snaprealm(1 seq 1 lc 0 cr 0 cps 1 snaps={} 0x53b8480) len=48

2019-03-08 19:30:12.024683 7f25ec184700 20 mds.0.cache.ino(1)  pfile 0 pauth 0 plink 0 pxattr 0 plocal 0 ctime 2019-03-07 21:12:21.476328 valid=1

2019-03-08 19:30:12.024688 7f25ec184700 10 mds.0.cache.ino(1) add_client_cap first cap, joining realm snaprealm(1 seq 1 lc 0 cr 0 cps 1 snaps={} 0x53b8480)

2019-03-08 19:30:12.026741 7f25ec184700 -1 *** Caught signal (Segmentation fault) **

 in thread 7f25ec184700

 

 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)

 1: ceph_mds() [0x89982a]

 2: (()+0x10350) [0x7f25f4647350]

 3: (CInode::get_caps_allowed_for_client(client_t) const+0x130) [0x7a19f0]

 4: (CInode::encode_inodestat(ceph::buffer::list&, Session*, SnapRealm*, snapid_t, unsigned int, int)+0x132d) [0x7b383d]

 5: (Server::set_trace_dist(Session*, MClientReply*, CInode*, CDentry*, snapid_t, int, std::tr1::shared_ptr<MDRequestImpl>&)+0x471) [0x5f26e1]

 6: (Server::reply_client_request(std::tr1::shared_ptr<MDRequestImpl>&, MClientReply*)+0x846) [0x611056]

 7: (Server::respond_to_request(std::tr1::shared_ptr<MDRequestImpl>&, int)+0x4d9) [0x611759]

 8: (Server::handle_client_getattr(std::tr1::shared_ptr<MDRequestImpl>&, bool)+0x47b) [0x613eab]

 9: (Server::dispatch_client_request(std::tr1::shared_ptr<MDRequestImpl>&)+0xa38) [0x633da8]

 10: (Server::handle_client_request(MClientRequest*)+0x3df) [0x63435f]

 11: (Server::dispatch(Message*)+0x3f3) [0x63b8b3]

 12: (MDS::handle_deferrable_message(Message*)+0x847) [0x5b6c27]

 13: (MDS::_dispatch(Message*)+0x6d) [0x5d2bed]

 14: (C_MDS_RetryMessage::finish(int)+0x1b) [0x63d24b]

 15: (MDSInternalContextBase::complete(int)+0x163) [0x7e3363]

 16: (MDS::_advance_queues()+0x48d) [0x5c9e4d]

 17: (MDS::ProgressThread::entry()+0x4a) [0x5ca1aa]

 18: (()+0x8192) [0x7f25f463f192]

 19: (clone()+0x6d) [0x7f25f3b4c26d]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

 

 

 

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux