Re: OSDMap problem: osd does not exist.

Yasuhiro Ohara <yasu@xxxxxxxxxxxx> · Thu, 19 Sep 2013 01:10:45 -0700 (PDT)

Hi Sage,

Thanks, after thrashing it became a little bit better,
but not yet healthy.

ceph -s: http://pastebin.com/vD28FJ4A
ceph osd dump: http://pastebin.com/37FLNxd7
ceph pg dump: http://pastebin.com/pccdg20j

(osd.0 and 1 are not running. I issued some "osd in" commands.
osd.4 are running but marked down/out: what is the "autoout" ?)

After thrashing some times (maybe I thrash it too much ?),
the osd clusters really thrashed much,
like in ceph -w: http://pastebin.com/fjeqrhxp

I thought osd's osdmap epoch was around 4900 (by seeing data/current/meta),
but it needed 6 or 7 osd thrash command execs until it seemed to work
on something, and epoch reached over 10000.
Now I see "deep scrub ok" some time in ceph -w.
But still the PGs are 'creating' state, and it does not seem to be
creating anything really.

I removed and re-creted pools, because the number of PGs are incorrect,
and it changed pool id 0,1,2 to 3,4,5. Is this causing the problem ?

By the way, MDS crashes on this cluster status.
ceph-mds.2.log: http://pastebin.com/Ruf5YB8d

Any suggestion is really appreciated.
Thanks.

regards,
Yasu

From: Sage Weil <sage@xxxxxxxxxxx>
Subject: Re:  OSDMap problem: osd does not exist.
Date: Wed, 18 Sep 2013 19:58:16 -0700 (PDT)
Message-ID: <alpine.DEB.2.00.1309181956020.23507@xxxxxxxxxxxxxxxxxx>

> Hey,
> 
> On Wed, 18 Sep 2013, Yasuhiro Ohara wrote:
>> 
>> Hi,
>> 
>> My OSDs are not joining the cluster correctly,
>> because the nonce they assume and receive from the peer are different.
>> It says "wrong node" because of the entity_id_t peer_addr (i.e., the
>> combination of the IP address, port number, and the nonce) is different.
>> 
>> Now, my questions are:
>> 1, Are the nonces of OSD peer addrs are kept in the osdmap ?
>> 2, (If so) can I modify the nonce value ?
>> 
>> More generally, how can I fix the cluster if I blew away the mon data ?
>> 
>> Below I'd like to summarize what I did.
>> - I tried upgrade from 0.57 to 0.67.3
>> - the mon protocol is different, and the mon data format seemed also
>>   different (changed to use leveldb ?). So restarting all mons.
>> - The mon data upgrade did not go well because of the full disk,
>>   but I didn't notice the cause and stupidly tried to start mon from scratch,
>>   building the mon data (mon --mkfs). (I solved the full disk problem
>>   later.)
>> - Now there's no OSD exising in the cluster (i.e., in osdmap).
>> - I added OSD configurations using "ceph osd create".
>> - Still OSDs do not recognize each other; they do not become peers.
>> - (The OSDs seem to hold the previous PG data still, and loading them
>>   is working fine. So I assume I still can recover the data.)
>> 
>> Does anyone have any advice on this ?
>> I'm planning to try to modify the source code because of no other choice,
>> so that they ignore nonce values :(
> 
> The nonce value is important; you can't just ignore it.  If they addr in 
> the osdmap isn't changing, it si probably because the mon thinks the 
> latest osdmap is N and the osd's think the latest is >> N.  I would look 
> in the osd data/current/meta directory and see what the newest osdmap 
> epoch is, compare that to 'ceph osd dump', and then do 'ceph osd thrash N' 
> to make it churn though a bunch of maps to get to an epoch that is > than 
> what he OSDs see.  Once that happens, the osd boot messages will properly 
> update the cluster osdmap with their new addr and things should start up.  
> Until then, the osd will just sit and wait to get a map newer than what 
> they have that will never come...
> 
> sage
> 
>> 
>> Thanks in advance.
>> 
>> regards,
>> Yasu
>> 
>> From: Yasuhiro Ohara <yasu@xxxxxxxxxxxx>
>> Subject: Re:  OSDMap problem: osd does not exist.
>> Date: Thu, 12 Sep 2013 09:45:51 -0700 (PDT)
>> Message-ID: <20130912.094551.06710597.yasu@xxxxxxxxxxxx>
>> 
>> > 
>> > Hi Joao,
>> > 
>> > Thank you for the response.
>> > I meant "ceph-mon -i X --mkfs".
>> > 
>> > Actually I did it on 3 node. On other 2 mon nodes, the original
>> > mon data were left, but currently all 5 nodes run ceph-mon again.
>> > That I shouldn't do that ?
>> > 
>> > regards,
>> > Yasu
>> > 
>> > From: Joao Eduardo Luis <joao.luis@xxxxxxxxxxx>
>> > Subject: Re:  OSDMap problem: osd does not exist.
>> > Date: Thu, 12 Sep 2013 11:35:40 +0100
>> > Message-ID: <523198FC.8050602@xxxxxxxxxxx>
>> > 
>> >> On 09/12/2013 09:35 AM, Yasuhiro Ohara wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> recently I tried to upgrade from 0.57 to 0.67.3, hit the changes
>> >>> of mon protocol, and so I updated all of the 5 mons.
>> >>> After upgrading the mon, (and during the debugging of other problems,)
>> >>> I removed and created the mon filesystem from scratch.
>> >> 
>> >> What do you mean by this?  Did you recreate the file system on all 5 monitors?  Did you backup any of your previous mon data directories?
>> >> 
>> >>   -Joao
>> >> 
>> >> -- 
>> >> Joao Eduardo Luis
>> >> Software Engineer | http://inktank.com | http://ceph.com
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com