Hey Gregory, I wiped and re-created the MDS-cluster I just mailed about, starting out by making sure CephFS is not mounted anywhere, stopping all MDSs, completely cleaning the "data" and "metadata"-pools using "rados --pool=<pool> cleanup <prefix>", then creating a new cluster using `ceph mds newfs 1 0 --yes-i-really-mean-it' and starting all MDSs again. Directly afterwards, I saw this: # rados --pool=metadata ls 1.00000000 2.00000000 200.00000000 200.00000001 600.00000000 601.00000000 602.00000000 603.00000000 605.00000000 606.00000000 608.00000000 609.00000000 mds0_inotable mds0_sessionmap Note the missing objects, right from the start. I was able to mount the CephFS at this point, but after unmounting it and restarting the MDS-cluster, it failed to come up, with the same symptoms as before. I didn't place any files on CephFS at any point between newfs and failure. Naturally, I tried initializing it again, but now, even after more than 5 tries, the "mds*"-objects simply no longer show up in the "metadata"-pool at all. In fact, it remains empty. I can mount CephFS after the first start of the MDS-cluster after a newfs, but on restart, it fails because of the missing objects. Am I doing anything wrong while initializing the cluster, maybe? Is cleaning the pools and doing the newfs enough? I did the same on the other cluster yesterday and it seems to have all objects. Regards, Oliver On di, 2013-09-10 at 16:24 -0700, Gregory Farnum wrote: > Nope, a repair won't change anything if scrub doesn't detect any > inconsistencies. There must be something else going on, but I can't > fathom what...I'll try and look through it a bit more tomorrow. :/ > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Tue, Sep 10, 2013 at 3:49 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > > Hey Gregory, > > > > Thanks for your explanation. Turns out to be 1.a7 and it seems to scrub > > OK. > > > > # ceph osd getmap -o osdmap > > # osdmaptool --test-map-object mds_anchortable --pool 1 osdmap > > osdmaptool: osdmap file 'osdmap' > > object 'mds_anchortable' -> 1.a7 -> [2,0] > > # ceph pg scrub 1.a7 > > > > osd.2 logs: > > 2013-09-11 00:41:15.843302 7faf56b1b700 0 log [INF] : 1.a7 scrub ok > > > > osd.0 didn't show anything in it's logs, though. Should I try a repair > > next? > > > > > > Regards, > > > > Oliver > > > > On di, 2013-09-10 at 15:01 -0700, Gregory Farnum wrote: > >> If the problem is somewhere in RADOS/xfs/whatever, then there's a good > >> chance that the "mds_anchortable" object exists in its replica OSDs, > >> but when listing objects those aren't queried, so they won't show up > >> in a listing. You can use the osdmaptool to map from an object name to > >> the PG it would show up in, or if you look at your log you should see > >> a line something like > >> 1 -- <LOCAL IP> --> <OTHER IP> -- osd_op(mds.0.31:3 mds_anchortable > >> [read 0~0] 1.a977f6a7 e165) v4 -- ?+0 0x1e88d80 con 0x1f189a0 > >> In this example, metadata is pool 1 and 1.a977f6a7 is the hash of the > >> msd_anchortable object, and depending on how many PGs are in the pool > >> it will be in pg 1.a7, or 1.6a7, or 1.f6a7... > >> -Greg > >> Software Engineer #42 @ http://inktank.com | http://ceph.com > >> > >> On Tue, Sep 10, 2013 at 2:51 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > >> > Hey Gregory, > >> > > >> > The only objects containing "table" I can find at all, are in the > >> > "metadata"-pool: > >> > # rados --pool=metadata ls | grep -i table > >> > mds0_inotable > >> > > >> > Looking at another cluster where I use CephFS, there is indeed an object > >> > named "mds_anchortable", but the broken cluster is missing it. I don't > >> > see how I can scrub the PG for an object that doesn't appear to exist. > >> > Please elaborate. > >> > > >> > > >> > Regards, > >> > > >> > Oliver > >> > > >> > On di, 2013-09-10 at 14:06 -0700, Gregory Farnum wrote: > >> >> Also, can you scrub the PG which contains the "mds_anchortable" object > >> >> and see if anything comes up? You should be able to find the key from > >> >> the logs (in the osd_op line that contains "mds_anchortable") and > >> >> convert that into the PG. Or you can just scrub all of osd 2. > >> >> -Greg > >> >> Software Engineer #42 @ http://inktank.com | http://ceph.com > >> >> > >> >> > >> >> On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > >> >> > It's not an upgrade issue. There's an MDS object that is somehow > >> >> > missing. If it exists, then on restart you'll be fine. > >> >> > > >> >> > Oliver, what is your general cluster config? What filesystem are your > >> >> > OSDs running on? What version of Ceph were you upgrading from? There's > >> >> > really no way for this file to not exist once created unless the > >> >> > underlying FS ate it or the last write both was interrupted and hit > >> >> > some kind of bug in our transaction code (of which none are known) > >> >> > during replay. > >> >> > -Greg > >> >> > Software Engineer #42 @ http://inktank.com | http://ceph.com > >> >> > > >> >> > > >> >> > On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry <Larry.Liu@xxxxxxxxxx> wrote: > >> >> >> This is scary. Should I hold on upgrade? > >> >> >> > >> >> >> On 9/10/13 11:33 AM, "Oliver Daudey" <oliver@xxxxxxxxx> wrote: > >> >> >> > >> >> >>>Hey Gregory, > >> >> >>> > >> >> >>>On 10-09-13 20:21, Gregory Farnum wrote: > >> >> >>>> On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey <oliver@xxxxxxxxx> > >> >> >>>>wrote: > >> >> >>>>> Hey list, > >> >> >>>>> > >> >> >>>>> I just upgraded to Ceph 0.67.3. What I did on every node of my 3-node > >> >> >>>>> cluster was: > >> >> >>>>> - Unmount CephFS everywhere. > >> >> >>>>> - Upgrade the Ceph-packages. > >> >> >>>>> - Restart MON. > >> >> >>>>> - Restart OSD. > >> >> >>>>> - Restart MDS. > >> >> >>>>> > >> >> >>>>> As soon as I got to the second node, the MDS crashed right after > >> >> >>>>>startup. > >> >> >>>>> > >> >> >>>>> Part of the logs (more on request): > >> >> >>>>> > >> >> >>>>> -> 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read > >> >> >>>>> 0~0] 1.d902 > >> >> >>>>> 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0 > >> >> >>>>> -11> 2013-09-10 19:35:02.798962 7fd1ba81f700 2 mds.0.58 boot_start > >> >> >>>>> 1: openin > >> >> >>>>> g mds log > >> >> >>>>> -10> 2013-09-10 19:35:02.798968 7fd1ba81f700 5 mds.0.log open > >> >> >>>>> discovering lo > >> >> >>>>> g bounds > >> >> >>>>> -9> 2013-09-10 19:35:02.798988 7fd1ba81f700 1 mds.0.journaler(ro) > >> >> >>>>> recover s > >> >> >>>>> tart > >> >> >>>>> -8> 2013-09-10 19:35:02.798990 7fd1ba81f700 1 mds.0.journaler(ro) > >> >> >>>>> read_head > >> >> >>>>> -7> 2013-09-10 19:35:02.799028 7fd1ba81f700 1 -- > >> >> >>>>> 194.109.43.12:6800/67277 - > >> >> >>>>> -> 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200.00000000 [read > >> >> >>>>>0~0] > >> >> >>>>> 1.844f3 > >> >> >>>>> 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00 > >> >> >>>>> -6> 2013-09-10 19:35:02.799053 7fd1ba81f700 1 -- > >> >> >>>>> 194.109.43.12:6800/67277 < > >> >> >>>>> == mon.2 194.109.43.13:6789/0 16 ==== mon_subscribe_ack(300s) v1 ==== > >> >> >>>>> 20+0+0 (42 > >> >> >>>>> 35168662 0 0) 0x1e93380 con 0x1e5d580 > >> >> >>>>> -5> 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient: > >> >> >>>>> handle_subscribe_a > >> >> >>>>> ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10 > >> >> >>>>>19:37:32.796448 > >> >> >>>>> -4> 2013-09-10 19:35:02.800907 7fd1ba81f700 5 mds.0.58 > >> >> >>>>> ms_handle_connect on > >> >> >>>>> 194.109.43.12:6802/53419 > >> >> >>>>> -3> 2013-09-10 19:35:02.800927 7fd1ba81f700 5 mds.0.58 > >> >> >>>>> ms_handle_connect on > >> >> >>>>> 194.109.43.13:6802/45791 > >> >> >>>>> -2> 2013-09-10 19:35:02.801176 7fd1ba81f700 5 mds.0.58 > >> >> >>>>> ms_handle_connect on > >> >> >>>>> 194.109.43.11:6800/16562 > >> >> >>>>> -1> 2013-09-10 19:35:02.803546 7fd1ba81f700 1 -- > >> >> >>>>> 194.109.43.12:6800/67277 < > >> >> >>>>> == osd.2 194.109.43.13:6802/45791 1 ==== osd_op_reply(3 mds_anchortable > >> >> >>>>> [read 0~ > >> >> >>>>> 0] ack = -2 (No such file or directory)) v4 ==== 114+0+0 (3107677671 0 > >> >> >>>>> 0) 0x1e4d > >> >> >>>>> e00 con 0x1e5ddc0 > >> >> >>>>> 0> 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In > >> >> >>>>> function > >> >> >>>>> 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread > >> >> >>>>> 7fd1ba81f700 ti > >> >> >>>>> me 2013-09-10 19:35:02.803673 > >> >> >>>>> mds/MDSTable.cc: 152: FAILED assert(r >= 0) > >> >> >>>>> > >> >> >>>>> ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a) > >> >> >>>>> 1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x44f) > >> >> >>>>>[0x77ce7f] > >> >> >>>>> 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b] > >> >> >>>>> 3: (MDS::handle_core_message(Message*)+0x987) [0x56f527] > >> >> >>>>> 4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef] > >> >> >>>>> 5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb] > >> >> >>>>> 6: (DispatchQueue::entry()+0x592) [0x92e432] > >> >> >>>>> 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd] > >> >> >>>>> 8: (()+0x68ca) [0x7fd1bed298ca] > >> >> >>>>> 9: (clone()+0x6d) [0x7fd1bda5cb6d] > >> >> >>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >> >> >>>>> needed to interpret this. > >> >> >>>>> > >> >> >>>>> When trying to mount CephFS, it just hangs now. Sometimes, an MDS > >> >> >>>>>stays > >> >> >>>>> up for a while, but will eventually crash again. This CephFS was > >> >> >>>>> created on 0.67 and I haven't done anything but mount and use it under > >> >> >>>>> very light load in the mean time. > >> >> >>>>> > >> >> >>>>> Any ideas, or if you need more info, let me know. It would be nice to > >> >> >>>>> get my data back, but I have backups too. > >> >> >>>> > >> >> >>>> Does the filesystem have any data in it? Every time we've seen this > >> >> >>>> error it's been on an empty cluster which had some weird issue with > >> >> >>>> startup. > >> >> >>> > >> >> >>>This one certainly had some data on it, yes. A couple of 100's of GBs > >> >> >>>of disk-images and a couple of trees of smaller files. Most of them > >> >> >>>accessed very rarely since being copied on. > >> >> >>> > >> >> >>> > >> >> >>> Regards, > >> >> >>> > >> >> >>> Oliver > >> >> >>>_______________________________________________ > >> >> >>>ceph-users mailing list > >> >> >>>ceph-users@xxxxxxxxxxxxxx > >> >> >>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> >> >> > >> >> > >> > > >> > > >> > > > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com