This is scary. Should I hold on upgrade? On 9/10/13 11:33 AM, "Oliver Daudey" <oliver@xxxxxxxxx> wrote: >Hey Gregory, > >On 10-09-13 20:21, Gregory Farnum wrote: >> On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey <oliver@xxxxxxxxx> >>wrote: >>> Hey list, >>> >>> I just upgraded to Ceph 0.67.3. What I did on every node of my 3-node >>> cluster was: >>> - Unmount CephFS everywhere. >>> - Upgrade the Ceph-packages. >>> - Restart MON. >>> - Restart OSD. >>> - Restart MDS. >>> >>> As soon as I got to the second node, the MDS crashed right after >>>startup. >>> >>> Part of the logs (more on request): >>> >>> -> 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read >>> 0~0] 1.d902 >>> 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0 >>> -11> 2013-09-10 19:35:02.798962 7fd1ba81f700 2 mds.0.58 boot_start >>> 1: openin >>> g mds log >>> -10> 2013-09-10 19:35:02.798968 7fd1ba81f700 5 mds.0.log open >>> discovering lo >>> g bounds >>> -9> 2013-09-10 19:35:02.798988 7fd1ba81f700 1 mds.0.journaler(ro) >>> recover s >>> tart >>> -8> 2013-09-10 19:35:02.798990 7fd1ba81f700 1 mds.0.journaler(ro) >>> read_head >>> -7> 2013-09-10 19:35:02.799028 7fd1ba81f700 1 -- >>> 194.109.43.12:6800/67277 - >>> -> 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200.00000000 [read >>>0~0] >>> 1.844f3 >>> 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00 >>> -6> 2013-09-10 19:35:02.799053 7fd1ba81f700 1 -- >>> 194.109.43.12:6800/67277 < >>> == mon.2 194.109.43.13:6789/0 16 ==== mon_subscribe_ack(300s) v1 ==== >>> 20+0+0 (42 >>> 35168662 0 0) 0x1e93380 con 0x1e5d580 >>> -5> 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient: >>> handle_subscribe_a >>> ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10 >>>19:37:32.796448 >>> -4> 2013-09-10 19:35:02.800907 7fd1ba81f700 5 mds.0.58 >>> ms_handle_connect on >>> 194.109.43.12:6802/53419 >>> -3> 2013-09-10 19:35:02.800927 7fd1ba81f700 5 mds.0.58 >>> ms_handle_connect on >>> 194.109.43.13:6802/45791 >>> -2> 2013-09-10 19:35:02.801176 7fd1ba81f700 5 mds.0.58 >>> ms_handle_connect on >>> 194.109.43.11:6800/16562 >>> -1> 2013-09-10 19:35:02.803546 7fd1ba81f700 1 -- >>> 194.109.43.12:6800/67277 < >>> == osd.2 194.109.43.13:6802/45791 1 ==== osd_op_reply(3 mds_anchortable >>> [read 0~ >>> 0] ack = -2 (No such file or directory)) v4 ==== 114+0+0 (3107677671 0 >>> 0) 0x1e4d >>> e00 con 0x1e5ddc0 >>> 0> 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In >>> function >>> 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread >>> 7fd1ba81f700 ti >>> me 2013-09-10 19:35:02.803673 >>> mds/MDSTable.cc: 152: FAILED assert(r >= 0) >>> >>> ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a) >>> 1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x44f) >>>[0x77ce7f] >>> 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b] >>> 3: (MDS::handle_core_message(Message*)+0x987) [0x56f527] >>> 4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef] >>> 5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb] >>> 6: (DispatchQueue::entry()+0x592) [0x92e432] >>> 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd] >>> 8: (()+0x68ca) [0x7fd1bed298ca] >>> 9: (clone()+0x6d) [0x7fd1bda5cb6d] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> When trying to mount CephFS, it just hangs now. Sometimes, an MDS >>>stays >>> up for a while, but will eventually crash again. This CephFS was >>> created on 0.67 and I haven't done anything but mount and use it under >>> very light load in the mean time. >>> >>> Any ideas, or if you need more info, let me know. It would be nice to >>> get my data back, but I have backups too. >> >> Does the filesystem have any data in it? Every time we've seen this >> error it's been on an empty cluster which had some weird issue with >> startup. > >This one certainly had some data on it, yes. A couple of 100's of GBs >of disk-images and a couple of trees of smaller files. Most of them >accessed very rarely since being copied on. > > > Regards, > > Oliver >_______________________________________________ >ceph-users mailing list >ceph-users@xxxxxxxxxxxxxx >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com