Re: CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

Oliver Daudey <oliver@xxxxxxxxx> · Wed, 11 Sep 2013 01:44:04 +0200

Hey Gregory,

On di, 2013-09-10 at 14:48 -0700, Gregory Farnum wrote:
> On Tue, Sep 10, 2013 at 2:36 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
> > Hey Gregory,
> >
> > My cluster consists of 3 nodes, each running 1 mon, 1 osd and 1 mds.  I
> > upgraded from 0.67, but was still running 0.61.7 OSDs at the time of the
> > upgrade, because of performance-issues that have just recently been
> > fixed.  These have now been upgraded to 0.67.3, along with the rest of
> > Ceph.  My OSDs are using XFS as the underlying FS.  I have been
> > switching one OSD in my cluster back and forth between 0.61.7 and some
> > test-versions, which where based on 0.67.x, to debug aforementioned
> > performance-issues with Samuel, but that was before I newfs'ed and
> > started using this instance of CephFS.  Furthermore, I don't seem to
> > have lost any other data during these tests.
> 
> Sam, any idea how we could have lost an object? I checked into how we
> touch this one, and all we ever do is read_full and write_full.
> 
> >
> > BTW: CephFS has never been very stable for me during stress-tests.  If
> > some components are brought down and back up again during operations,
> > like stopping and restarting all components on one node while generating
> > some load with a cp of a big CephFS directory-tree on another, then,
> > once things settle again, doing the same on another node, it always
> > quickly ends up like what I see now.
> 
> Do you have multiple active MDSes? Or do you just mean when you do a
> reboot while generating load it migrates?

I use 1 active/2 standby.  If I happen to stop the active MDS, access to
CephFS hangs for a bit and then it switches to a standby-MDS, after
which access resumes.  By that time, I may bring the node with the MDS I
shut down back up, wait for things to settle and stop services on
another node.  I haven't used configurations with multiple active MDSs
much, because it was considered less well tested.

> 
> > MDSs crashing on start or on
> > attempts to mount the CephFS and the only way out being to stop the
> > MDSs, wipe the contents of the "data" and "metadata"-pools and doing the
> > newfs-thing.  I can only assume you guys are putting it through similar
> > stress-tests, but if not, try it.
> 
> Yeah, we have a bunch of these. I'm not sure that we coordinate
> killing an entire node at a time, but I can't think of any way that
> would matter. :/

Now that I know what to look for when CephFS fails in this manner again,
I'll be sure to have a better look at the objects themselves and make a
detailed report to the list.

> 
> > PS: Is there a way to get back at the data after something like this?
> > Do you still want me to keep the current situation to debug it further,
> > or can I zap everything, restore my backups and move on?  Thanks!
> 
> You could figure out how to build a fake anchortable (just generate an
> empty one with ceph-dencoder) and that would let you do most stuff,
> although if you have any hard links then those would be lost and I'm
> not sure exactly what that would mean at this point — it's possible
> with the new lookup-by-ino stuff that it wouldn't matter at all, or it
> might make them inaccessible from one link and un-deletable when
> removed from the other. (via the FS, that is.) If restoring from
> backups is feasible I'd probably just shoot for that after doing a
> scrub. (If the scrub turns up something dirty then probably it can be
> recovered via a RADOS repair.)

Thanks for your explanation!  I'll zap the whole thing and restore from
backup.

   Regards,

      Oliver

> > On di, 2013-09-10 at 13:59 -0700, Gregory Farnum wrote:
> >> It's not an upgrade issue. There's an MDS object that is somehow
> >> missing. If it exists, then on restart you'll be fine.
> >>
> >> Oliver, what is your general cluster config? What filesystem are your
> >> OSDs running on? What version of Ceph were you upgrading from? There's
> >> really no way for this file to not exist once created unless the
> >> underlying FS ate it or the last write both was interrupted and hit
> >> some kind of bug in our transaction code (of which none are known)
> >> during replay.
> >> -Greg
> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >>
> >>
> >> On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry <Larry.Liu@xxxxxxxxxx> wrote:
> >> > This is scary. Should I hold on upgrade?
> >> >
> >> > On 9/10/13 11:33 AM, "Oliver Daudey" <oliver@xxxxxxxxx> wrote:
> >> >
> >> >>Hey Gregory,
> >> >>
> >> >>On 10-09-13 20:21, Gregory Farnum wrote:
> >> >>> On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey <oliver@xxxxxxxxx>
> >> >>>wrote:
> >> >>>> Hey list,
> >> >>>>
> >> >>>> I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
> >> >>>> cluster was:
> >> >>>> - Unmount CephFS everywhere.
> >> >>>> - Upgrade the Ceph-packages.
> >> >>>> - Restart MON.
> >> >>>> - Restart OSD.
> >> >>>> - Restart MDS.
> >> >>>>
> >> >>>> As soon as I got to the second node, the MDS crashed right after
> >> >>>>startup.
> >> >>>>
> >> >>>> Part of the logs (more on request):
> >> >>>>
> >> >>>> -> 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
> >> >>>> 0~0] 1.d902
> >> >>>> 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
> >> >>>>    -11> 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
> >> >>>> 1: openin
> >> >>>> g mds log
> >> >>>>    -10> 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
> >> >>>> discovering lo
> >> >>>> g bounds
> >> >>>>     -9> 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
> >> >>>> recover s
> >> >>>> tart
> >> >>>>     -8> 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
> >> >>>> read_head
> >> >>>>     -7> 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
> >> >>>> 194.109.43.12:6800/67277 -
> >> >>>> -> 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200.00000000 [read
> >> >>>>0~0]
> >> >>>> 1.844f3
> >> >>>> 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
> >> >>>>     -6> 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
> >> >>>> 194.109.43.12:6800/67277 <
> >> >>>> == mon.2 194.109.43.13:6789/0 16 ==== mon_subscribe_ack(300s) v1 ====
> >> >>>> 20+0+0 (42
> >> >>>> 35168662 0 0) 0x1e93380 con 0x1e5d580
> >> >>>>     -5> 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
> >> >>>> handle_subscribe_a
> >> >>>> ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10
> >> >>>>19:37:32.796448
> >> >>>>     -4> 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
> >> >>>> ms_handle_connect on
> >> >>>>  194.109.43.12:6802/53419
> >> >>>>     -3> 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
> >> >>>> ms_handle_connect on
> >> >>>>  194.109.43.13:6802/45791
> >> >>>>     -2> 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
> >> >>>> ms_handle_connect on
> >> >>>>  194.109.43.11:6800/16562
> >> >>>>     -1> 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
> >> >>>> 194.109.43.12:6800/67277 <
> >> >>>> == osd.2 194.109.43.13:6802/45791 1 ==== osd_op_reply(3 mds_anchortable
> >> >>>> [read 0~
> >> >>>> 0] ack = -2 (No such file or directory)) v4 ==== 114+0+0 (3107677671 0
> >> >>>> 0) 0x1e4d
> >> >>>> e00 con 0x1e5ddc0
> >> >>>>      0> 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
> >> >>>> function
> >> >>>> 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread
> >> >>>> 7fd1ba81f700 ti
> >> >>>> me 2013-09-10 19:35:02.803673
> >> >>>> mds/MDSTable.cc: 152: FAILED assert(r >= 0)
> >> >>>>
> >> >>>>  ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
> >> >>>>  1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x44f)
> >> >>>>[0x77ce7f]
> >> >>>>  2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b]
> >> >>>>  3: (MDS::handle_core_message(Message*)+0x987) [0x56f527]
> >> >>>>  4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef]
> >> >>>>  5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb]
> >> >>>>  6: (DispatchQueue::entry()+0x592) [0x92e432]
> >> >>>>  7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd]
> >> >>>>  8: (()+0x68ca) [0x7fd1bed298ca]
> >> >>>>  9: (clone()+0x6d) [0x7fd1bda5cb6d]
> >> >>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >> >>>> needed to interpret this.
> >> >>>>
> >> >>>> When trying to mount CephFS, it just hangs now.  Sometimes, an MDS
> >> >>>>stays
> >> >>>> up for a while, but will eventually crash again.  This CephFS was
> >> >>>> created on 0.67 and I haven't done anything but mount and use it under
> >> >>>> very light load in the mean time.
> >> >>>>
> >> >>>> Any ideas, or if you need more info, let me know.  It would be nice to
> >> >>>> get my data back, but I have backups too.
> >> >>>
> >> >>> Does the filesystem have any data in it? Every time we've seen this
> >> >>> error it's been on an empty cluster which had some weird issue with
> >> >>> startup.
> >> >>
> >> >>This one certainly had some data on it, yes.  A couple of 100's of GBs
> >> >>of disk-images and a couple of trees of smaller files.  Most of them
> >> >>accessed very rarely since being copied on.
> >> >>
> >> >>
> >> >>   Regards,
> >> >>
> >> >>      Oliver
> >> >>_______________________________________________
> >> >>ceph-users mailing list
> >> >>ceph-users@xxxxxxxxxxxxxx
> >> >>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >>
> >
> >
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com