CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

Oliver Daudey <oliver@xxxxxxxxx> · Tue, 10 Sep 2013 19:54:56 +0200

Hey list,

I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
cluster was:
- Unmount CephFS everywhere.
- Upgrade the Ceph-packages.
- Restart MON.
- Restart OSD.
- Restart MDS.

As soon as I got to the second node, the MDS crashed right after startup.

Part of the logs (more on request):

-> 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
0~0] 1.d902
70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
   -11> 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
1: openin
g mds log
   -10> 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
discovering lo
g bounds
    -9> 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
recover s
tart
    -8> 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
read_head
    -7> 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
194.109.43.12:6800/67277 -
-> 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200.00000000 [read 0~0]
1.844f3
494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
    -6> 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
194.109.43.12:6800/67277 <
== mon.2 194.109.43.13:6789/0 16 ==== mon_subscribe_ack(300s) v1 ====
20+0+0 (42
35168662 0 0) 0x1e93380 con 0x1e5d580
    -5> 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
handle_subscribe_a
ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10 19:37:32.796448
    -4> 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
ms_handle_connect on
 194.109.43.12:6802/53419
    -3> 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
ms_handle_connect on
 194.109.43.13:6802/45791
    -2> 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
ms_handle_connect on
 194.109.43.11:6800/16562
    -1> 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
194.109.43.12:6800/67277 <
== osd.2 194.109.43.13:6802/45791 1 ==== osd_op_reply(3 mds_anchortable
[read 0~
0] ack = -2 (No such file or directory)) v4 ==== 114+0+0 (3107677671 0
0) 0x1e4d
e00 con 0x1e5ddc0
     0> 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
function
'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread
7fd1ba81f700 ti
me 2013-09-10 19:35:02.803673
mds/MDSTable.cc: 152: FAILED assert(r >= 0)

 ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
 1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x44f) [0x77ce7f]
 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b]
 3: (MDS::handle_core_message(Message*)+0x987) [0x56f527]
 4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef]
 5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb]
 6: (DispatchQueue::entry()+0x592) [0x92e432]
 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd]
 8: (()+0x68ca) [0x7fd1bed298ca]
 9: (clone()+0x6d) [0x7fd1bda5cb6d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

When trying to mount CephFS, it just hangs now.  Sometimes, an MDS stays
up for a while, but will eventually crash again.  This CephFS was
created on 0.67 and I haven't done anything but mount and use it under
very light load in the mean time.

Any ideas, or if you need more info, let me know.  It would be nice to
get my data back, but I have backups too.

PS: Note the "No such file or directory" in the above logs.

   Regards,

      Oliver
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com