Re: OSDMap problem: osd does not exist.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey,

On Wed, 18 Sep 2013, Yasuhiro Ohara wrote:
> 
> Hi,
> 
> My OSDs are not joining the cluster correctly,
> because the nonce they assume and receive from the peer are different.
> It says "wrong node" because of the entity_id_t peer_addr (i.e., the
> combination of the IP address, port number, and the nonce) is different.
> 
> Now, my questions are:
> 1, Are the nonces of OSD peer addrs are kept in the osdmap ?
> 2, (If so) can I modify the nonce value ?
> 
> More generally, how can I fix the cluster if I blew away the mon data ?
> 
> Below I'd like to summarize what I did.
> - I tried upgrade from 0.57 to 0.67.3
> - the mon protocol is different, and the mon data format seemed also
>   different (changed to use leveldb ?). So restarting all mons.
> - The mon data upgrade did not go well because of the full disk,
>   but I didn't notice the cause and stupidly tried to start mon from scratch,
>   building the mon data (mon --mkfs). (I solved the full disk problem
>   later.)
> - Now there's no OSD exising in the cluster (i.e., in osdmap).
> - I added OSD configurations using "ceph osd create".
> - Still OSDs do not recognize each other; they do not become peers.
> - (The OSDs seem to hold the previous PG data still, and loading them
>   is working fine. So I assume I still can recover the data.)
> 
> Does anyone have any advice on this ?
> I'm planning to try to modify the source code because of no other choice,
> so that they ignore nonce values :(

The nonce value is important; you can't just ignore it.  If they addr in 
the osdmap isn't changing, it si probably because the mon thinks the 
latest osdmap is N and the osd's think the latest is >> N.  I would look 
in the osd data/current/meta directory and see what the newest osdmap 
epoch is, compare that to 'ceph osd dump', and then do 'ceph osd thrash N' 
to make it churn though a bunch of maps to get to an epoch that is > than 
what he OSDs see.  Once that happens, the osd boot messages will properly 
update the cluster osdmap with their new addr and things should start up.  
Until then, the osd will just sit and wait to get a map newer than what 
they have that will never come...

sage

> 
> Thanks in advance.
> 
> regards,
> Yasu
> 
> From: Yasuhiro Ohara <yasu@xxxxxxxxxxxx>
> Subject: Re:  OSDMap problem: osd does not exist.
> Date: Thu, 12 Sep 2013 09:45:51 -0700 (PDT)
> Message-ID: <20130912.094551.06710597.yasu@xxxxxxxxxxxx>
> 
> > 
> > Hi Joao,
> > 
> > Thank you for the response.
> > I meant "ceph-mon -i X --mkfs".
> > 
> > Actually I did it on 3 node. On other 2 mon nodes, the original
> > mon data were left, but currently all 5 nodes run ceph-mon again.
> > That I shouldn't do that ?
> > 
> > regards,
> > Yasu
> > 
> > From: Joao Eduardo Luis <joao.luis@xxxxxxxxxxx>
> > Subject: Re:  OSDMap problem: osd does not exist.
> > Date: Thu, 12 Sep 2013 11:35:40 +0100
> > Message-ID: <523198FC.8050602@xxxxxxxxxxx>
> > 
> >> On 09/12/2013 09:35 AM, Yasuhiro Ohara wrote:
> >>>
> >>> Hi,
> >>>
> >>> recently I tried to upgrade from 0.57 to 0.67.3, hit the changes
> >>> of mon protocol, and so I updated all of the 5 mons.
> >>> After upgrading the mon, (and during the debugging of other problems,)
> >>> I removed and created the mon filesystem from scratch.
> >> 
> >> What do you mean by this?  Did you recreate the file system on all 5 monitors?  Did you backup any of your previous mon data directories?
> >> 
> >>   -Joao
> >> 
> >> -- 
> >> Joao Eduardo Luis
> >> Software Engineer | http://inktank.com | http://ceph.com
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux