Re: MDS segfaults on client connection -- brand new FS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I don’t have any idea what’s going on here or why it’s not working, but you are using v0.94.7. That release is:
1) out of date for the Hammer cycle, which reached at least .94.10
2) prior to the release where we declared CephFS stable (Jewel, v10.2.0)
3) way past its supported expiration date.

You will have a much better time deploying Luminous or Mimic, especially since you want to use CephFS. :)
-Greg

On Fri, Mar 8, 2019 at 5:02 PM Kadiyska, Yana <ykadiysk@xxxxxxxxxx> wrote:

Hi,

 

I’m very much hoping someone can unblock me on this – we recently ran into a very odd issue – I sent an earlier email to the list

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033579.html

 

After unsuccessfully trying to repair we decided to forsake the Filesystem

 

I marked the cluster down, failed the MDSs, removed the FS and the metadata and data pools.

 

Then created a new Filesystem from scratch.

 

However, I am still observing MDS segfaulting when a client tries to connect. This is quite urgent for me as we don’t have a functioning Filesystem – if someone can advise how I can remove any and all state please do so – I just want to start fresh. I am very puzzled that a brand new FS doesn’t work

 

Here is the MDS log at level 20 – one odd thing I notice is that the client seems to start showing ? as the id well before the segfault…In any case, I’m just asking what needs to be done to remove all state from the MDS nodes:

 

2019-03-08 19:30:12.024535 7f25ec184700 20 mds.0.server get_session have 0x5477e00 client.2160819875 <client_ip>:0/945029522 state open

2019-03-08 19:30:12.024537 7f25ec184700 15 mds.0.server  oldest_client_tid=1

2019-03-08 19:30:12.024564 7f25ec184700  7 mds.0.cache request_start request(client.?:1 cr=0x54a8680)

2019-03-08 19:30:12.024566 7f25ec184700  7 mds.0.server dispatch_client_request client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510 RETRY=2) v2

2019-03-08 19:30:12.024576 7f25ec184700 10 mds.0.server rdlock_path_pin_ref request(client.?:1 cr=0x54a8680) #1

2019-03-08 19:30:12.024577 7f25ec184700  7 mds.0.cache traverse: opening base ino 1 snap head

2019-03-08 19:30:12.024579 7f25ec184700 10 mds.0.cache path_traverse finish on snapid head

2019-03-08 19:30:12.024580 7f25ec184700 10 mds.0.server ref is [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024589 7f25ec184700 10 mds.0.locker acquire_locks request(client.?:1 cr=0x54a8680)

2019-03-08 19:30:12.024591 7f25ec184700 20 mds.0.locker  must rdlock (iauth sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024594 7f25ec184700 20 mds.0.locker  must rdlock (ilink sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024597 7f25ec184700 20 mds.0.locker  must rdlock (ifile sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024600 7f25ec184700 20 mds.0.locker  must rdlock (ixattr sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024602 7f25ec184700 20 mds.0.locker  must rdlock (isnap sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024605 7f25ec184700 10 mds.0.locker  must authpin [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024607 7f25ec184700 10 mds.0.locker  auth_pinning [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024610 7f25ec184700 10 mds.0.cache.ino(1) auth_pin by 0x51e5e00 on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968] now 1+0

2019-03-08 19:30:12.024614 7f25ec184700  7 mds.0.locker rdlock_start  on (isnap sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024618 7f25ec184700 10 mds.0.locker  got rdlock on (isnap sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024621 7f25ec184700  7 mds.0.locker rdlock_start  on (ifile sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024625 7f25ec184700 10 mds.0.locker  got rdlock on (ifile sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=2 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024628 7f25ec184700  7 mds.0.locker rdlock_start  on (iauth sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=2 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024631 7f25ec184700 10 mds.0.locker  got rdlock on (iauth sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=3 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024635 7f25ec184700  7 mds.0.locker rdlock_start  on (ilink sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=3 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024638 7f25ec184700 10 mds.0.locker  got rdlock on (ilink sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=4 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024642 7f25ec184700  7 mds.0.locker rdlock_start  on (ixattr sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=4 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024646 7f25ec184700 10 mds.0.locker  got rdlock on (ixattr sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile sync r=1) (ixattr sync r=1) (iversion lock) | request=1 lock=5 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024658 7f25ec184700 10 mds.0.server reply to stat on client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510 RETRY=2) v2

2019-03-08 19:30:12.024661 7f25ec184700 10 mds.0.server reply_client_request 0 ((0) Success) client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510 RETRY=2) v2

2019-03-08 19:30:12.024673 7f25ec184700 10 mds.0.server apply_allocated_inos 0 / [] / 0

2019-03-08 19:30:12.024674 7f25ec184700 20 mds.0.server lat 0.060895

2019-03-08 19:30:12.024677 7f25ec184700 20 mds.0.server set_trace_dist snapid head

2019-03-08 19:30:12.024679 7f25ec184700 10 mds.0.server set_trace_dist snaprealm snaprealm(1 seq 1 lc 0 cr 0 cps 1 snaps={} 0x53b8480) len=48

2019-03-08 19:30:12.024683 7f25ec184700 20 mds.0.cache.ino(1)  pfile 0 pauth 0 plink 0 pxattr 0 plocal 0 ctime 2019-03-07 21:12:21.476328 valid=1

2019-03-08 19:30:12.024688 7f25ec184700 10 mds.0.cache.ino(1) add_client_cap first cap, joining realm snaprealm(1 seq 1 lc 0 cr 0 cps 1 snaps={} 0x53b8480)

2019-03-08 19:30:12.026741 7f25ec184700 -1 *** Caught signal (Segmentation fault) **

 in thread 7f25ec184700

 

 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)

 1: ceph_mds() [0x89982a]

 2: (()+0x10350) [0x7f25f4647350]

 3: (CInode::get_caps_allowed_for_client(client_t) const+0x130) [0x7a19f0]

 4: (CInode::encode_inodestat(ceph::buffer::list&, Session*, SnapRealm*, snapid_t, unsigned int, int)+0x132d) [0x7b383d]

 5: (Server::set_trace_dist(Session*, MClientReply*, CInode*, CDentry*, snapid_t, int, std::tr1::shared_ptr<MDRequestImpl>&)+0x471) [0x5f26e1]

 6: (Server::reply_client_request(std::tr1::shared_ptr<MDRequestImpl>&, MClientReply*)+0x846) [0x611056]

 7: (Server::respond_to_request(std::tr1::shared_ptr<MDRequestImpl>&, int)+0x4d9) [0x611759]

 8: (Server::handle_client_getattr(std::tr1::shared_ptr<MDRequestImpl>&, bool)+0x47b) [0x613eab]

 9: (Server::dispatch_client_request(std::tr1::shared_ptr<MDRequestImpl>&)+0xa38) [0x633da8]

 10: (Server::handle_client_request(MClientRequest*)+0x3df) [0x63435f]

 11: (Server::dispatch(Message*)+0x3f3) [0x63b8b3]

 12: (MDS::handle_deferrable_message(Message*)+0x847) [0x5b6c27]

 13: (MDS::_dispatch(Message*)+0x6d) [0x5d2bed]

 14: (C_MDS_RetryMessage::finish(int)+0x1b) [0x63d24b]

 15: (MDSInternalContextBase::complete(int)+0x163) [0x7e3363]

 16: (MDS::_advance_queues()+0x48d) [0x5c9e4d]

 17: (MDS::ProgressThread::entry()+0x4a) [0x5ca1aa]

 18: (()+0x8192) [0x7f25f463f192]

 19: (clone()+0x6d) [0x7f25f3b4c26d]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

 

 

 

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux