Re: CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 10 Sep 2013 15:01:47 -0700



If the problem is somewhere in RADOS/xfs/whatever, then there's a good
chance that the "mds_anchortable" object exists in its replica OSDs,
but when listing objects those aren't queried, so they won't show up
in a listing. You can use the osdmaptool to map from an object name to
the PG it would show up in, or if you look at your log you should see
a line something like
1 -- <LOCAL IP> --> <OTHER IP> -- osd_op(mds.0.31:3 mds_anchortable
[read 0~0] 1.a977f6a7 e165) v4 -- ?+0 0x1e88d80 con 0x1f189a0
In this example, metadata is pool 1 and 1.a977f6a7 is the hash of the
msd_anchortable object, and depending on how many PGs are in the pool
it will be in pg 1.a7, or 1.6a7, or 1.f6a7...
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Tue, Sep 10, 2013 at 2:51 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
> Hey Gregory,
>
> The only objects containing "table" I can find at all, are in the
> "metadata"-pool:
> # rados --pool=metadata ls | grep -i table
> mds0_inotable
>
> Looking at another cluster where I use CephFS, there is indeed an object
> named "mds_anchortable", but the broken cluster is missing it.  I don't
> see how I can scrub the PG for an object that doesn't appear to exist.
> Please elaborate.
>
>
>    Regards,
>
>      Oliver
>
> On di, 2013-09-10 at 14:06 -0700, Gregory Farnum wrote:
>> Also, can you scrub the PG which contains the "mds_anchortable" object
>> and see if anything comes up? You should be able to find the key from
>> the logs (in the osd_op line that contains "mds_anchortable") and
>> convert that into the PG. Or you can just scrub all of osd 2.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> > It's not an upgrade issue. There's an MDS object that is somehow
>> > missing. If it exists, then on restart you'll be fine.
>> >
>> > Oliver, what is your general cluster config? What filesystem are your
>> > OSDs running on? What version of Ceph were you upgrading from? There's
>> > really no way for this file to not exist once created unless the
>> > underlying FS ate it or the last write both was interrupted and hit
>> > some kind of bug in our transaction code (of which none are known)
>> > during replay.
>> > -Greg
>> > Software Engineer #42 @ http://inktank.com | http://ceph.com
>> >
>> >
>> > On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry <Larry.Liu@xxxxxxxxxx> wrote:
>> >> This is scary. Should I hold on upgrade?
>> >>
>> >> On 9/10/13 11:33 AM, "Oliver Daudey" <oliver@xxxxxxxxx> wrote:
>> >>
>> >>>Hey Gregory,
>> >>>
>> >>>On 10-09-13 20:21, Gregory Farnum wrote:
>> >>>> On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey <oliver@xxxxxxxxx>
>> >>>>wrote:
>> >>>>> Hey list,
>> >>>>>
>> >>>>> I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
>> >>>>> cluster was:
>> >>>>> - Unmount CephFS everywhere.
>> >>>>> - Upgrade the Ceph-packages.
>> >>>>> - Restart MON.
>> >>>>> - Restart OSD.
>> >>>>> - Restart MDS.
>> >>>>>
>> >>>>> As soon as I got to the second node, the MDS crashed right after
>> >>>>>startup.
>> >>>>>
>> >>>>> Part of the logs (more on request):
>> >>>>>
>> >>>>> -> 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
>> >>>>> 0~0] 1.d902
>> >>>>> 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
>> >>>>>    -11> 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
>> >>>>> 1: openin
>> >>>>> g mds log
>> >>>>>    -10> 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
>> >>>>> discovering lo
>> >>>>> g bounds
>> >>>>>     -9> 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
>> >>>>> recover s
>> >>>>> tart
>> >>>>>     -8> 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
>> >>>>> read_head
>> >>>>>     -7> 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
>> >>>>> 194.109.43.12:6800/67277 -
>> >>>>> -> 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200.00000000 [read
>> >>>>>0~0]
>> >>>>> 1.844f3
>> >>>>> 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
>> >>>>>     -6> 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
>> >>>>> 194.109.43.12:6800/67277 <
>> >>>>> == mon.2 194.109.43.13:6789/0 16 ==== mon_subscribe_ack(300s) v1 ====
>> >>>>> 20+0+0 (42
>> >>>>> 35168662 0 0) 0x1e93380 con 0x1e5d580
>> >>>>>     -5> 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
>> >>>>> handle_subscribe_a
>> >>>>> ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10
>> >>>>>19:37:32.796448
>> >>>>>     -4> 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
>> >>>>> ms_handle_connect on
>> >>>>>  194.109.43.12:6802/53419
>> >>>>>     -3> 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
>> >>>>> ms_handle_connect on
>> >>>>>  194.109.43.13:6802/45791
>> >>>>>     -2> 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
>> >>>>> ms_handle_connect on
>> >>>>>  194.109.43.11:6800/16562
>> >>>>>     -1> 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
>> >>>>> 194.109.43.12:6800/67277 <
>> >>>>> == osd.2 194.109.43.13:6802/45791 1 ==== osd_op_reply(3 mds_anchortable
>> >>>>> [read 0~
>> >>>>> 0] ack = -2 (No such file or directory)) v4 ==== 114+0+0 (3107677671 0
>> >>>>> 0) 0x1e4d
>> >>>>> e00 con 0x1e5ddc0
>> >>>>>      0> 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
>> >>>>> function
>> >>>>> 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread
>> >>>>> 7fd1ba81f700 ti
>> >>>>> me 2013-09-10 19:35:02.803673
>> >>>>> mds/MDSTable.cc: 152: FAILED assert(r >= 0)
>> >>>>>
>> >>>>>  ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
>> >>>>>  1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x44f)
>> >>>>>[0x77ce7f]
>> >>>>>  2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b]
>> >>>>>  3: (MDS::handle_core_message(Message*)+0x987) [0x56f527]
>> >>>>>  4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef]
>> >>>>>  5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb]
>> >>>>>  6: (DispatchQueue::entry()+0x592) [0x92e432]
>> >>>>>  7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd]
>> >>>>>  8: (()+0x68ca) [0x7fd1bed298ca]
>> >>>>>  9: (clone()+0x6d) [0x7fd1bda5cb6d]
>> >>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> >>>>> needed to interpret this.
>> >>>>>
>> >>>>> When trying to mount CephFS, it just hangs now.  Sometimes, an MDS
>> >>>>>stays
>> >>>>> up for a while, but will eventually crash again.  This CephFS was
>> >>>>> created on 0.67 and I haven't done anything but mount and use it under
>> >>>>> very light load in the mean time.
>> >>>>>
>> >>>>> Any ideas, or if you need more info, let me know.  It would be nice to
>> >>>>> get my data back, but I have backups too.
>> >>>>
>> >>>> Does the filesystem have any data in it? Every time we've seen this
>> >>>> error it's been on an empty cluster which had some weird issue with
>> >>>> startup.
>> >>>
>> >>>This one certainly had some data on it, yes.  A couple of 100's of GBs
>> >>>of disk-images and a couple of trees of smaller files.  Most of them
>> >>>accessed very rarely since being copied on.
>> >>>
>> >>>
>> >>>   Regards,
>> >>>
>> >>>      Oliver
>> >>>_______________________________________________
>> >>>ceph-users mailing list
>> >>>ceph-users@xxxxxxxxxxxxxx
>> >>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com