Hey list, I just upgraded to Ceph 0.67.3. What I did on every node of my 3-node cluster was: - Unmount CephFS everywhere. - Upgrade the Ceph-packages. - Restart MON. - Restart OSD. - Restart MDS. As soon as I got to the second node, the MDS crashed right after startup. Part of the logs (more on request): -> 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read 0~0] 1.d902 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0 -11> 2013-09-10 19:35:02.798962 7fd1ba81f700 2 mds.0.58 boot_start 1: openin g mds log -10> 2013-09-10 19:35:02.798968 7fd1ba81f700 5 mds.0.log open discovering lo g bounds -9> 2013-09-10 19:35:02.798988 7fd1ba81f700 1 mds.0.journaler(ro) recover s tart -8> 2013-09-10 19:35:02.798990 7fd1ba81f700 1 mds.0.journaler(ro) read_head -7> 2013-09-10 19:35:02.799028 7fd1ba81f700 1 -- 194.109.43.12:6800/67277 - -> 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200.00000000 [read 0~0] 1.844f3 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00 -6> 2013-09-10 19:35:02.799053 7fd1ba81f700 1 -- 194.109.43.12:6800/67277 < == mon.2 194.109.43.13:6789/0 16 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (42 35168662 0 0) 0x1e93380 con 0x1e5d580 -5> 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient: handle_subscribe_a ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10 19:37:32.796448 -4> 2013-09-10 19:35:02.800907 7fd1ba81f700 5 mds.0.58 ms_handle_connect on 194.109.43.12:6802/53419 -3> 2013-09-10 19:35:02.800927 7fd1ba81f700 5 mds.0.58 ms_handle_connect on 194.109.43.13:6802/45791 -2> 2013-09-10 19:35:02.801176 7fd1ba81f700 5 mds.0.58 ms_handle_connect on 194.109.43.11:6800/16562 -1> 2013-09-10 19:35:02.803546 7fd1ba81f700 1 -- 194.109.43.12:6800/67277 < == osd.2 194.109.43.13:6802/45791 1 ==== osd_op_reply(3 mds_anchortable [read 0~ 0] ack = -2 (No such file or directory)) v4 ==== 114+0+0 (3107677671 0 0) 0x1e4d e00 con 0x1e5ddc0 0> 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread 7fd1ba81f700 ti me 2013-09-10 19:35:02.803673 mds/MDSTable.cc: 152: FAILED assert(r >= 0) ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a) 1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x44f) [0x77ce7f] 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b] 3: (MDS::handle_core_message(Message*)+0x987) [0x56f527] 4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef] 5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb] 6: (DispatchQueue::entry()+0x592) [0x92e432] 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd] 8: (()+0x68ca) [0x7fd1bed298ca] 9: (clone()+0x6d) [0x7fd1bda5cb6d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. When trying to mount CephFS, it just hangs now. Sometimes, an MDS stays up for a while, but will eventually crash again. This CephFS was created on 0.67 and I haven't done anything but mount and use it under very light load in the mean time. Any ideas, or if you need more info, let me know. It would be nice to get my data back, but I have backups too. PS: Note the "No such file or directory" in the above logs. Regards, Oliver _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com