Hi, My OSDs are not joining the cluster correctly, because the nonce they assume and receive from the peer are different. It says "wrong node" because of the entity_id_t peer_addr (i.e., the combination of the IP address, port number, and the nonce) is different. Now, my questions are: 1, Are the nonces of OSD peer addrs are kept in the osdmap ? 2, (If so) can I modify the nonce value ? More generally, how can I fix the cluster if I blew away the mon data ? Below I'd like to summarize what I did. - I tried upgrade from 0.57 to 0.67.3 - the mon protocol is different, and the mon data format seemed also different (changed to use leveldb ?). So restarting all mons. - The mon data upgrade did not go well because of the full disk, but I didn't notice the cause and stupidly tried to start mon from scratch, building the mon data (mon --mkfs). (I solved the full disk problem later.) - Now there's no OSD exising in the cluster (i.e., in osdmap). - I added OSD configurations using "ceph osd create". - Still OSDs do not recognize each other; they do not become peers. - (The OSDs seem to hold the previous PG data still, and loading them is working fine. So I assume I still can recover the data.) Does anyone have any advice on this ? I'm planning to try to modify the source code because of no other choice, so that they ignore nonce values :( Thanks in advance. regards, Yasu From: Yasuhiro Ohara <yasu@xxxxxxxxxxxx> Subject: Re: OSDMap problem: osd does not exist. Date: Thu, 12 Sep 2013 09:45:51 -0700 (PDT) Message-ID: <20130912.094551.06710597.yasu@xxxxxxxxxxxx> > > Hi Joao, > > Thank you for the response. > I meant "ceph-mon -i X --mkfs". > > Actually I did it on 3 node. On other 2 mon nodes, the original > mon data were left, but currently all 5 nodes run ceph-mon again. > That I shouldn't do that ? > > regards, > Yasu > > From: Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> > Subject: Re: OSDMap problem: osd does not exist. > Date: Thu, 12 Sep 2013 11:35:40 +0100 > Message-ID: <523198FC.8050602@xxxxxxxxxxx> > >> On 09/12/2013 09:35 AM, Yasuhiro Ohara wrote: >>> >>> Hi, >>> >>> recently I tried to upgrade from 0.57 to 0.67.3, hit the changes >>> of mon protocol, and so I updated all of the 5 mons. >>> After upgrading the mon, (and during the debugging of other problems,) >>> I removed and created the mon filesystem from scratch. >> >> What do you mean by this? Did you recreate the file system on all 5 monitors? Did you backup any of your previous mon data directories? >> >> -Joao >> >> -- >> Joao Eduardo Luis >> Software Engineer | http://inktank.com | http://ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com