Hey Gregory, The only objects containing "table" I can find at all, are in the "metadata"-pool: # rados --pool=metadata ls | grep -i table mds0_inotable Looking at another cluster where I use CephFS, there is indeed an object named "mds_anchortable", but the broken cluster is missing it. I don't see how I can scrub the PG for an object that doesn't appear to exist. Please elaborate. Regards, Oliver On di, 2013-09-10 at 14:06 -0700, Gregory Farnum wrote: > Also, can you scrub the PG which contains the "mds_anchortable" object > and see if anything comes up? You should be able to find the key from > the logs (in the osd_op line that contains "mds_anchortable") and > convert that into the PG. Or you can just scrub all of osd 2. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > > It's not an upgrade issue. There's an MDS object that is somehow > > missing. If it exists, then on restart you'll be fine. > > > > Oliver, what is your general cluster config? What filesystem are your > > OSDs running on? What version of Ceph were you upgrading from? There's > > really no way for this file to not exist once created unless the > > underlying FS ate it or the last write both was interrupted and hit > > some kind of bug in our transaction code (of which none are known) > > during replay. > > -Greg > > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > > > > On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry <Larry.Liu@xxxxxxxxxx> wrote: > >> This is scary. Should I hold on upgrade? > >> > >> On 9/10/13 11:33 AM, "Oliver Daudey" <oliver@xxxxxxxxx> wrote: > >> > >>>Hey Gregory, > >>> > >>>On 10-09-13 20:21, Gregory Farnum wrote: > >>>> On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey <oliver@xxxxxxxxx> > >>>>wrote: > >>>>> Hey list, > >>>>> > >>>>> I just upgraded to Ceph 0.67.3. What I did on every node of my 3-node > >>>>> cluster was: > >>>>> - Unmount CephFS everywhere. > >>>>> - Upgrade the Ceph-packages. > >>>>> - Restart MON. > >>>>> - Restart OSD. > >>>>> - Restart MDS. > >>>>> > >>>>> As soon as I got to the second node, the MDS crashed right after > >>>>>startup. > >>>>> > >>>>> Part of the logs (more on request): > >>>>> > >>>>> -> 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read > >>>>> 0~0] 1.d902 > >>>>> 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0 > >>>>> -11> 2013-09-10 19:35:02.798962 7fd1ba81f700 2 mds.0.58 boot_start > >>>>> 1: openin > >>>>> g mds log > >>>>> -10> 2013-09-10 19:35:02.798968 7fd1ba81f700 5 mds.0.log open > >>>>> discovering lo > >>>>> g bounds > >>>>> -9> 2013-09-10 19:35:02.798988 7fd1ba81f700 1 mds.0.journaler(ro) > >>>>> recover s > >>>>> tart > >>>>> -8> 2013-09-10 19:35:02.798990 7fd1ba81f700 1 mds.0.journaler(ro) > >>>>> read_head > >>>>> -7> 2013-09-10 19:35:02.799028 7fd1ba81f700 1 -- > >>>>> 194.109.43.12:6800/67277 - > >>>>> -> 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200.00000000 [read > >>>>>0~0] > >>>>> 1.844f3 > >>>>> 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00 > >>>>> -6> 2013-09-10 19:35:02.799053 7fd1ba81f700 1 -- > >>>>> 194.109.43.12:6800/67277 < > >>>>> == mon.2 194.109.43.13:6789/0 16 ==== mon_subscribe_ack(300s) v1 ==== > >>>>> 20+0+0 (42 > >>>>> 35168662 0 0) 0x1e93380 con 0x1e5d580 > >>>>> -5> 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient: > >>>>> handle_subscribe_a > >>>>> ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10 > >>>>>19:37:32.796448 > >>>>> -4> 2013-09-10 19:35:02.800907 7fd1ba81f700 5 mds.0.58 > >>>>> ms_handle_connect on > >>>>> 194.109.43.12:6802/53419 > >>>>> -3> 2013-09-10 19:35:02.800927 7fd1ba81f700 5 mds.0.58 > >>>>> ms_handle_connect on > >>>>> 194.109.43.13:6802/45791 > >>>>> -2> 2013-09-10 19:35:02.801176 7fd1ba81f700 5 mds.0.58 > >>>>> ms_handle_connect on > >>>>> 194.109.43.11:6800/16562 > >>>>> -1> 2013-09-10 19:35:02.803546 7fd1ba81f700 1 -- > >>>>> 194.109.43.12:6800/67277 < > >>>>> == osd.2 194.109.43.13:6802/45791 1 ==== osd_op_reply(3 mds_anchortable > >>>>> [read 0~ > >>>>> 0] ack = -2 (No such file or directory)) v4 ==== 114+0+0 (3107677671 0 > >>>>> 0) 0x1e4d > >>>>> e00 con 0x1e5ddc0 > >>>>> 0> 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In > >>>>> function > >>>>> 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread > >>>>> 7fd1ba81f700 ti > >>>>> me 2013-09-10 19:35:02.803673 > >>>>> mds/MDSTable.cc: 152: FAILED assert(r >= 0) > >>>>> > >>>>> ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a) > >>>>> 1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x44f) > >>>>>[0x77ce7f] > >>>>> 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b] > >>>>> 3: (MDS::handle_core_message(Message*)+0x987) [0x56f527] > >>>>> 4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef] > >>>>> 5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb] > >>>>> 6: (DispatchQueue::entry()+0x592) [0x92e432] > >>>>> 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd] > >>>>> 8: (()+0x68ca) [0x7fd1bed298ca] > >>>>> 9: (clone()+0x6d) [0x7fd1bda5cb6d] > >>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>>>> needed to interpret this. > >>>>> > >>>>> When trying to mount CephFS, it just hangs now. Sometimes, an MDS > >>>>>stays > >>>>> up for a while, but will eventually crash again. This CephFS was > >>>>> created on 0.67 and I haven't done anything but mount and use it under > >>>>> very light load in the mean time. > >>>>> > >>>>> Any ideas, or if you need more info, let me know. It would be nice to > >>>>> get my data back, but I have backups too. > >>>> > >>>> Does the filesystem have any data in it? Every time we've seen this > >>>> error it's been on an empty cluster which had some weird issue with > >>>> startup. > >>> > >>>This one certainly had some data on it, yes. A couple of 100's of GBs > >>>of disk-images and a couple of trees of smaller files. Most of them > >>>accessed very rarely since being copied on. > >>> > >>> > >>> Regards, > >>> > >>> Oliver > >>>_______________________________________________ > >>>ceph-users mailing list > >>>ceph-users@xxxxxxxxxxxxxx > >>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com