On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > Hey Gregory, > > I wiped and re-created the MDS-cluster I just mailed about, starting out > by making sure CephFS is not mounted anywhere, stopping all MDSs, > completely cleaning the "data" and "metadata"-pools using "rados > --pool=<pool> cleanup <prefix>", then creating a new cluster using `ceph > mds newfs 1 0 --yes-i-really-mean-it' and starting all MDSs again. > Directly afterwards, I saw this: > # rados --pool=metadata ls > 1.00000000 > 2.00000000 > 200.00000000 > 200.00000001 > 600.00000000 > 601.00000000 > 602.00000000 > 603.00000000 > 605.00000000 > 606.00000000 > 608.00000000 > 609.00000000 > mds0_inotable > mds0_sessionmap > > Note the missing objects, right from the start. I was able to mount the > CephFS at this point, but after unmounting it and restarting the > MDS-cluster, it failed to come up, with the same symptoms as before. I > didn't place any files on CephFS at any point between newfs and failure. > Naturally, I tried initializing it again, but now, even after more than > 5 tries, the "mds*"-objects simply no longer show up in the > "metadata"-pool at all. In fact, it remains empty. I can mount CephFS > after the first start of the MDS-cluster after a newfs, but on restart, > it fails because of the missing objects. Am I doing anything wrong > while initializing the cluster, maybe? Is cleaning the pools and doing > the newfs enough? I did the same on the other cluster yesterday and it > seems to have all objects. > Thank you for your default information. The cause of missing object is that the MDS IDs for old FS and new FS are the same (incarnations are the same). When OSD receives MDS requests for the newly created FS. It silently drops the requests, because it thinks they are duplicated. You can get around the bug by creating new pools for the newfs. Regards Yan, Zheng > > Regards, > > Oliver > > On di, 2013-09-10 at 16:24 -0700, Gregory Farnum wrote: >> Nope, a repair won't change anything if scrub doesn't detect any >> inconsistencies. There must be something else going on, but I can't >> fathom what...I'll try and look through it a bit more tomorrow. :/ >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Tue, Sep 10, 2013 at 3:49 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: >> > Hey Gregory, >> > >> > Thanks for your explanation. Turns out to be 1.a7 and it seems to scrub >> > OK. >> > >> > # ceph osd getmap -o osdmap >> > # osdmaptool --test-map-object mds_anchortable --pool 1 osdmap >> > osdmaptool: osdmap file 'osdmap' >> > object 'mds_anchortable' -> 1.a7 -> [2,0] >> > # ceph pg scrub 1.a7 >> > >> > osd.2 logs: >> > 2013-09-11 00:41:15.843302 7faf56b1b700 0 log [INF] : 1.a7 scrub ok >> > >> > osd.0 didn't show anything in it's logs, though. Should I try a repair >> > next? >> > >> > >> > Regards, >> > >> > Oliver >> > >> > On di, 2013-09-10 at 15:01 -0700, Gregory Farnum wrote: >> >> If the problem is somewhere in RADOS/xfs/whatever, then there's a good >> >> chance that the "mds_anchortable" object exists in its replica OSDs, >> >> but when listing objects those aren't queried, so they won't show up >> >> in a listing. You can use the osdmaptool to map from an object name to >> >> the PG it would show up in, or if you look at your log you should see >> >> a line something like >> >> 1 -- <LOCAL IP> --> <OTHER IP> -- osd_op(mds.0.31:3 mds_anchortable >> >> [read 0~0] 1.a977f6a7 e165) v4 -- ?+0 0x1e88d80 con 0x1f189a0 >> >> In this example, metadata is pool 1 and 1.a977f6a7 is the hash of the >> >> msd_anchortable object, and depending on how many PGs are in the pool >> >> it will be in pg 1.a7, or 1.6a7, or 1.f6a7... >> >> -Greg >> >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> >> On Tue, Sep 10, 2013 at 2:51 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: >> >> > Hey Gregory, >> >> > >> >> > The only objects containing "table" I can find at all, are in the >> >> > "metadata"-pool: >> >> > # rados --pool=metadata ls | grep -i table >> >> > mds0_inotable >> >> > >> >> > Looking at another cluster where I use CephFS, there is indeed an object >> >> > named "mds_anchortable", but the broken cluster is missing it. I don't >> >> > see how I can scrub the PG for an object that doesn't appear to exist. >> >> > Please elaborate. >> >> > >> >> > >> >> > Regards, >> >> > >> >> > Oliver >> >> > >> >> > On di, 2013-09-10 at 14:06 -0700, Gregory Farnum wrote: >> >> >> Also, can you scrub the PG which contains the "mds_anchortable" object >> >> >> and see if anything comes up? You should be able to find the key from >> >> >> the logs (in the osd_op line that contains "mds_anchortable") and >> >> >> convert that into the PG. Or you can just scrub all of osd 2. >> >> >> -Greg >> >> >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> >> >> >> >> >> >> On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> >> > It's not an upgrade issue. There's an MDS object that is somehow >> >> >> > missing. If it exists, then on restart you'll be fine. >> >> >> > >> >> >> > Oliver, what is your general cluster config? What filesystem are your >> >> >> > OSDs running on? What version of Ceph were you upgrading from? There's >> >> >> > really no way for this file to not exist once created unless the >> >> >> > underlying FS ate it or the last write both was interrupted and hit >> >> >> > some kind of bug in our transaction code (of which none are known) >> >> >> > during replay. >> >> >> > -Greg >> >> >> > Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> > >> >> >> > >> >> >> > On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry <Larry.Liu@xxxxxxxxxx> wrote: >> >> >> >> This is scary. Should I hold on upgrade? >> >> >> >> >> >> >> >> On 9/10/13 11:33 AM, "Oliver Daudey" <oliver@xxxxxxxxx> wrote: >> >> >> >> >> >> >> >>>Hey Gregory, >> >> >> >>> >> >> >> >>>On 10-09-13 20:21, Gregory Farnum wrote: >> >> >> >>>> On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey <oliver@xxxxxxxxx> >> >> >> >>>>wrote: >> >> >> >>>>> Hey list, >> >> >> >>>>> >> >> >> >>>>> I just upgraded to Ceph 0.67.3. What I did on every node of my 3-node >> >> >> >>>>> cluster was: >> >> >> >>>>> - Unmount CephFS everywhere. >> >> >> >>>>> - Upgrade the Ceph-packages. >> >> >> >>>>> - Restart MON. >> >> >> >>>>> - Restart OSD. >> >> >> >>>>> - Restart MDS. >> >> >> >>>>> >> >> >> >>>>> As soon as I got to the second node, the MDS crashed right after >> >> >> >>>>>startup. >> >> >> >>>>> >> >> >> >>>>> Part of the logs (more on request): >> >> >> >>>>> >> >> >> >>>>> -> 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read >> >> >> >>>>> 0~0] 1.d902 >> >> >> >>>>> 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0 >> >> >> >>>>> -11> 2013-09-10 19:35:02.798962 7fd1ba81f700 2 mds.0.58 boot_start >> >> >> >>>>> 1: openin >> >> >> >>>>> g mds log >> >> >> >>>>> -10> 2013-09-10 19:35:02.798968 7fd1ba81f700 5 mds.0.log open >> >> >> >>>>> discovering lo >> >> >> >>>>> g bounds >> >> >> >>>>> -9> 2013-09-10 19:35:02.798988 7fd1ba81f700 1 mds.0.journaler(ro) >> >> >> >>>>> recover s >> >> >> >>>>> tart >> >> >> >>>>> -8> 2013-09-10 19:35:02.798990 7fd1ba81f700 1 mds.0.journaler(ro) >> >> >> >>>>> read_head >> >> >> >>>>> -7> 2013-09-10 19:35:02.799028 7fd1ba81f700 1 -- >> >> >> >>>>> 194.109.43.12:6800/67277 - >> >> >> >>>>> -> 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200.00000000 [read >> >> >> >>>>>0~0] >> >> >> >>>>> 1.844f3 >> >> >> >>>>> 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00 >> >> >> >>>>> -6> 2013-09-10 19:35:02.799053 7fd1ba81f700 1 -- >> >> >> >>>>> 194.109.43.12:6800/67277 < >> >> >> >>>>> == mon.2 194.109.43.13:6789/0 16 ==== mon_subscribe_ack(300s) v1 ==== >> >> >> >>>>> 20+0+0 (42 >> >> >> >>>>> 35168662 0 0) 0x1e93380 con 0x1e5d580 >> >> >> >>>>> -5> 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient: >> >> >> >>>>> handle_subscribe_a >> >> >> >>>>> ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10 >> >> >> >>>>>19:37:32.796448 >> >> >> >>>>> -4> 2013-09-10 19:35:02.800907 7fd1ba81f700 5 mds.0.58 >> >> >> >>>>> ms_handle_connect on >> >> >> >>>>> 194.109.43.12:6802/53419 >> >> >> >>>>> -3> 2013-09-10 19:35:02.800927 7fd1ba81f700 5 mds.0.58 >> >> >> >>>>> ms_handle_connect on >> >> >> >>>>> 194.109.43.13:6802/45791 >> >> >> >>>>> -2> 2013-09-10 19:35:02.801176 7fd1ba81f700 5 mds.0.58 >> >> >> >>>>> ms_handle_connect on >> >> >> >>>>> 194.109.43.11:6800/16562 >> >> >> >>>>> -1> 2013-09-10 19:35:02.803546 7fd1ba81f700 1 -- >> >> >> >>>>> 194.109.43.12:6800/67277 < >> >> >> >>>>> == osd.2 194.109.43.13:6802/45791 1 ==== osd_op_reply(3 mds_anchortable >> >> >> >>>>> [read 0~ >> >> >> >>>>> 0] ack = -2 (No such file or directory)) v4 ==== 114+0+0 (3107677671 0 >> >> >> >>>>> 0) 0x1e4d >> >> >> >>>>> e00 con 0x1e5ddc0 >> >> >> >>>>> 0> 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In >> >> >> >>>>> function >> >> >> >>>>> 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread >> >> >> >>>>> 7fd1ba81f700 ti >> >> >> >>>>> me 2013-09-10 19:35:02.803673 >> >> >> >>>>> mds/MDSTable.cc: 152: FAILED assert(r >= 0) >> >> >> >>>>> >> >> >> >>>>> ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a) >> >> >> >>>>> 1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x44f) >> >> >> >>>>>[0x77ce7f] >> >> >> >>>>> 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b] >> >> >> >>>>> 3: (MDS::handle_core_message(Message*)+0x987) [0x56f527] >> >> >> >>>>> 4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef] >> >> >> >>>>> 5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb] >> >> >> >>>>> 6: (DispatchQueue::entry()+0x592) [0x92e432] >> >> >> >>>>> 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd] >> >> >> >>>>> 8: (()+0x68ca) [0x7fd1bed298ca] >> >> >> >>>>> 9: (clone()+0x6d) [0x7fd1bda5cb6d] >> >> >> >>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> >> >> >>>>> needed to interpret this. >> >> >> >>>>> >> >> >> >>>>> When trying to mount CephFS, it just hangs now. Sometimes, an MDS >> >> >> >>>>>stays >> >> >> >>>>> up for a while, but will eventually crash again. This CephFS was >> >> >> >>>>> created on 0.67 and I haven't done anything but mount and use it under >> >> >> >>>>> very light load in the mean time. >> >> >> >>>>> >> >> >> >>>>> Any ideas, or if you need more info, let me know. It would be nice to >> >> >> >>>>> get my data back, but I have backups too. >> >> >> >>>> >> >> >> >>>> Does the filesystem have any data in it? Every time we've seen this >> >> >> >>>> error it's been on an empty cluster which had some weird issue with >> >> >> >>>> startup. >> >> >> >>> >> >> >> >>>This one certainly had some data on it, yes. A couple of 100's of GBs >> >> >> >>>of disk-images and a couple of trees of smaller files. Most of them >> >> >> >>>accessed very rarely since being copied on. >> >> >> >>> >> >> >> >>> >> >> >> >>> Regards, >> >> >> >>> >> >> >> >>> Oliver >> >> >> >>>_______________________________________________ >> >> >> >>>ceph-users mailing list >> >> >> >>>ceph-users@xxxxxxxxxxxxxx >> >> >> >>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> >> >> >> >> > >> >> > >> >> >> > >> > >> > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com