Hi Sage, Thank you for the response. So, it seems that mon data can be removed and recovered later, only if osdmap is saved (in binary) and incorporated at the time of initial creation of mon data (i.e., mon --mkfs) ? I created the new osdmap by osdmaptool --createsimple, which provided a different PG size for pools, and in turn it made me think I need re-create the pool (to fix the osdmap). Another critical one of my misunderstandings was that I thought osdmap can be re-created and set easily such as in the case of crushmap. I think it would be helpful for users if there's description for instructions when you lost the mon data accidentally. I had saved my old mon data (not removed in one of my 5 mons) but couldn't retrieve the previous osdmap from them, which I guess I could, theoretically. But anyway, I'll start from scratch. Thank you very much for the help. Yes I'll be careful not to do that again :) regards, Yasu From: Sage Weil <sage@xxxxxxxxxxx> Subject: Re: OSDMap problem: osd does not exist. Date: Thu, 19 Sep 2013 05:48:35 -0700 (PDT) Message-ID: <alpine.DEB.2.00.1309190544200.23507@xxxxxxxxxxxxxxxxxx> > On Thu, 19 Sep 2013, Yasuhiro Ohara wrote: >> >> Hi Sage, >> >> Thanks, after thrashing it became a little bit better, >> but not yet healthy. >> >> ceph -s: http://pastebin.com/vD28FJ4A >> ceph osd dump: http://pastebin.com/37FLNxd7 >> ceph pg dump: http://pastebin.com/pccdg20j >> >> (osd.0 and 1 are not running. I issued some "osd in" commands. >> osd.4 are running but marked down/out: what is the "autoout" ?) >> >> After thrashing some times (maybe I thrash it too much ?), >> the osd clusters really thrashed much, >> like in ceph -w: http://pastebin.com/fjeqrhxp >> >> I thought osd's osdmap epoch was around 4900 (by seeing data/current/meta), >> but it needed 6 or 7 osd thrash command execs until it seemed to work >> on something, and epoch reached over 10000. >> Now I see "deep scrub ok" some time in ceph -w. >> But still the PGs are 'creating' state, and it does not seem to be >> creating anything really. >> >> I removed and re-creted pools, because the number of PGs are incorrect, >> and it changed pool id 0,1,2 to 3,4,5. Is this causing the problem ? > > If you deleted and recreated the pools, you may as well just wipe the > cluster and start over from scratch... the data is gone. The MDS is > crashing because the pool referenced by the MDSMap is gone and it has no > fs (meta)data. > > I suggest just starting from scratch. And next time, don't delete all of > the monitor data! :) > > sage > > >> >> By the way, MDS crashes on this cluster status. >> ceph-mds.2.log: http://pastebin.com/Ruf5YB8d >> >> Any suggestion is really appreciated. >> Thanks. >> >> regards, >> Yasu >> >> From: Sage Weil <sage@xxxxxxxxxxx> >> Subject: Re: OSDMap problem: osd does not exist. >> Date: Wed, 18 Sep 2013 19:58:16 -0700 (PDT) >> Message-ID: <alpine.DEB.2.00.1309181956020.23507@xxxxxxxxxxxxxxxxxx> >> >> > Hey, >> > >> > On Wed, 18 Sep 2013, Yasuhiro Ohara wrote: >> >> >> >> Hi, >> >> >> >> My OSDs are not joining the cluster correctly, >> >> because the nonce they assume and receive from the peer are different. >> >> It says "wrong node" because of the entity_id_t peer_addr (i.e., the >> >> combination of the IP address, port number, and the nonce) is different. >> >> >> >> Now, my questions are: >> >> 1, Are the nonces of OSD peer addrs are kept in the osdmap ? >> >> 2, (If so) can I modify the nonce value ? >> >> >> >> More generally, how can I fix the cluster if I blew away the mon data ? >> >> >> >> Below I'd like to summarize what I did. >> >> - I tried upgrade from 0.57 to 0.67.3 >> >> - the mon protocol is different, and the mon data format seemed also >> >> different (changed to use leveldb ?). So restarting all mons. >> >> - The mon data upgrade did not go well because of the full disk, >> >> but I didn't notice the cause and stupidly tried to start mon from scratch, >> >> building the mon data (mon --mkfs). (I solved the full disk problem >> >> later.) >> >> - Now there's no OSD exising in the cluster (i.e., in osdmap). >> >> - I added OSD configurations using "ceph osd create". >> >> - Still OSDs do not recognize each other; they do not become peers. >> >> - (The OSDs seem to hold the previous PG data still, and loading them >> >> is working fine. So I assume I still can recover the data.) >> >> >> >> Does anyone have any advice on this ? >> >> I'm planning to try to modify the source code because of no other choice, >> >> so that they ignore nonce values :( >> > >> > The nonce value is important; you can't just ignore it. If they addr in >> > the osdmap isn't changing, it si probably because the mon thinks the >> > latest osdmap is N and the osd's think the latest is >> N. I would look >> > in the osd data/current/meta directory and see what the newest osdmap >> > epoch is, compare that to 'ceph osd dump', and then do 'ceph osd thrash N' >> > to make it churn though a bunch of maps to get to an epoch that is > than >> > what he OSDs see. Once that happens, the osd boot messages will properly >> > update the cluster osdmap with their new addr and things should start up. >> > Until then, the osd will just sit and wait to get a map newer than what >> > they have that will never come... >> > >> > sage >> > >> >> >> >> Thanks in advance. >> >> >> >> regards, >> >> Yasu >> >> >> >> From: Yasuhiro Ohara <yasu@xxxxxxxxxxxx> >> >> Subject: Re: OSDMap problem: osd does not exist. >> >> Date: Thu, 12 Sep 2013 09:45:51 -0700 (PDT) >> >> Message-ID: <20130912.094551.06710597.yasu@xxxxxxxxxxxx> >> >> >> >> > >> >> > Hi Joao, >> >> > >> >> > Thank you for the response. >> >> > I meant "ceph-mon -i X --mkfs". >> >> > >> >> > Actually I did it on 3 node. On other 2 mon nodes, the original >> >> > mon data were left, but currently all 5 nodes run ceph-mon again. >> >> > That I shouldn't do that ? >> >> > >> >> > regards, >> >> > Yasu >> >> > >> >> > From: Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> >> >> > Subject: Re: OSDMap problem: osd does not exist. >> >> > Date: Thu, 12 Sep 2013 11:35:40 +0100 >> >> > Message-ID: <523198FC.8050602@xxxxxxxxxxx> >> >> > >> >> >> On 09/12/2013 09:35 AM, Yasuhiro Ohara wrote: >> >> >>> >> >> >>> Hi, >> >> >>> >> >> >>> recently I tried to upgrade from 0.57 to 0.67.3, hit the changes >> >> >>> of mon protocol, and so I updated all of the 5 mons. >> >> >>> After upgrading the mon, (and during the debugging of other problems,) >> >> >>> I removed and created the mon filesystem from scratch. >> >> >> >> >> >> What do you mean by this? Did you recreate the file system on all 5 monitors? Did you backup any of your previous mon data directories? >> >> >> >> >> >> -Joao >> >> >> >> >> >> -- >> >> >> Joao Eduardo Luis >> >> >> Software Engineer | http://inktank.com | http://ceph.com >> >> >> _______________________________________________ >> >> >> ceph-users mailing list >> >> >> ceph-users@xxxxxxxxxxxxxx >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@xxxxxxxxxxxxxx >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com