Re: Cluster Map Problems

John Wilkins <john.wilkins@xxxxxxxxxxx> · Thu, 28 Mar 2013 15:45:47 -0700

Martin,

I'm just speculating: since I just rewrote the networking section and
there is an empty mon_host value, and I do recall a chat last week
where mon_host was considered a different setting now, maybe you might
try specifying:

[mon.a]
        mon host = store1
        mon addr = 192.168.195.31:6789

etc. for monitors. I'm assuming that's not the case, but I want to
make sure my docs are right on this point.

On Thu, Mar 28, 2013 at 3:24 PM, Martin Mailand <martin@xxxxxxxxxxxx> wrote:
> Hi John,
>
> my ceph.conf is a bit further down in this email.
>
> -martin
>
> Am 28.03.2013 23:21, schrieb John Wilkins:
>
>> Martin,
>>
>> Would you mind posting your Ceph configuration file too?  I don't see
>> any value set for "mon_host": ""
>>
>> On Thu, Mar 28, 2013 at 1:04 PM, Martin Mailand <martin@xxxxxxxxxxxx>
>> wrote:
>>>
>>> Hi Greg,
>>>
>>> the dump from mon.a is attached.
>>>
>>> -martin
>>>
>>> On 28.03.2013 20:55, Gregory Farnum wrote:
>>>>
>>>> Hmm. The monitor code for checking this all looks good to me. Can you
>>>> go to one of your monitor nodes and dump the config?
>>>>
>>>> (http://ceph.com/docs/master/rados/configuration/ceph-conf/?highlight=admin%20socket#viewing-a-configuration-at-runtime)
>>>> -Greg
>>>>
>>>> On Thu, Mar 28, 2013 at 12:33 PM, Martin Mailand <martin@xxxxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I get the same behavior an new created cluster as well, no changes to
>>>>> the cluster config at all.
>>>>> I stop the osd.1, after 20 seconds it got marked down. But it never get
>>>>> marked out.
>>>>>
>>>>> ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759)
>>>>>
>>>>> -martin
>>>>>
>>>>> On 28.03.2013 19:48, John Wilkins wrote:
>>>>>>
>>>>>> Martin,
>>>>>>
>>>>>> Greg is talking about noout. With Ceph, you can specifically preclude
>>>>>> OSDs from being marked out when down to prevent rebalancing--e.g.,
>>>>>> during upgrades, short-term maintenance, etc.
>>>>>>
>>>>>>
>>>>>> http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#stopping-w-out-rebalancing
>>>>>>
>>>>>> On Thu, Mar 28, 2013 at 11:12 AM, Martin Mailand <martin@xxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Greg,
>>>>>>>
>>>>>>> setting the osd manually out triggered the recovery.
>>>>>>> But now it is the question, why is the osd not marked out after 300
>>>>>>> seconds? That's a default cluster, I use the 0.59 build from your
>>>>>>> site.
>>>>>>> And I didn't change any value, except for the crushmap.
>>>>>>>
>>>>>>> That's my ceph.conf.
>>>>>>>
>>>>>>> -martin
>>>>>>>
>>>>>>> [global]
>>>>>>>          auth cluster requierd = none
>>>>>>>          auth service required = none
>>>>>>>          auth client required = none
>>>>>>> #       log file = ""
>>>>>>>          log_max_recent=100
>>>>>>>          log_max_new=100
>>>>>>>
>>>>>>> [mon]
>>>>>>>          mon data = /data/mon.$id
>>>>>>> [mon.a]
>>>>>>>          host = store1
>>>>>>>          mon addr = 192.168.195.31:6789
>>>>>>> [mon.b]
>>>>>>>          host = store3
>>>>>>>          mon addr = 192.168.195.33:6789
>>>>>>> [mon.c]
>>>>>>>          host = store5
>>>>>>>          mon addr = 192.168.195.35:6789
>>>>>>> [osd]
>>>>>>>          journal aio = true
>>>>>>>          osd data = /data/osd.$id
>>>>>>>          osd mount options btrfs = rw,noatime,nodiratime,autodefrag
>>>>>>>          osd mkfs options btrfs = -n 32k -l 32k
>>>>>>>
>>>>>>> [osd.0]
>>>>>>>          host = store1
>>>>>>>          osd journal = /dev/sdg1
>>>>>>>          btrfs devs = /dev/sdc
>>>>>>> [osd.1]
>>>>>>>          host = store1
>>>>>>>          osd journal = /dev/sdh1
>>>>>>>          btrfs devs = /dev/sdd
>>>>>>> [osd.2]
>>>>>>>          host = store1
>>>>>>>          osd journal = /dev/sdi1
>>>>>>>          btrfs devs = /dev/sde
>>>>>>> [osd.3]
>>>>>>>          host = store1
>>>>>>>          osd journal = /dev/sdj1
>>>>>>>          btrfs devs = /dev/sdf
>>>>>>> [osd.4]
>>>>>>>          host = store2
>>>>>>>          osd journal = /dev/sdg1
>>>>>>>          btrfs devs = /dev/sdc
>>>>>>> [osd.5]
>>>>>>>          host = store2
>>>>>>>          osd journal = /dev/sdh1
>>>>>>>          btrfs devs = /dev/sdd
>>>>>>> [osd.6]
>>>>>>>          host = store2
>>>>>>>          osd journal = /dev/sdi1
>>>>>>>          btrfs devs = /dev/sde
>>>>>>> [osd.7]
>>>>>>>          host = store2
>>>>>>>          osd journal = /dev/sdj1
>>>>>>>          btrfs devs = /dev/sdf
>>>>>>> [osd.8]
>>>>>>>          host = store3
>>>>>>>          osd journal = /dev/sdg1
>>>>>>>          btrfs devs = /dev/sdc
>>>>>>> [osd.9]
>>>>>>>          host = store3
>>>>>>>          osd journal = /dev/sdh1
>>>>>>>          btrfs devs = /dev/sdd
>>>>>>> [osd.10]
>>>>>>>          host = store3
>>>>>>>          osd journal = /dev/sdi1
>>>>>>>          btrfs devs = /dev/sde
>>>>>>> [osd.11]
>>>>>>>          host = store3
>>>>>>>          osd journal = /dev/sdj1
>>>>>>>          btrfs devs = /dev/sdf
>>>>>>> [osd.12]
>>>>>>>          host = store4
>>>>>>>          osd journal = /dev/sdg1
>>>>>>>          btrfs devs = /dev/sdc
>>>>>>> [osd.13]
>>>>>>>          host = store4
>>>>>>>          osd journal = /dev/sdh1
>>>>>>>          btrfs devs = /dev/sdd
>>>>>>> [osd.14]
>>>>>>>          host = store4
>>>>>>>          osd journal = /dev/sdi1
>>>>>>>          btrfs devs = /dev/sde
>>>>>>> [osd.15]
>>>>>>>          host = store4
>>>>>>>          osd journal = /dev/sdj1
>>>>>>>          btrfs devs = /dev/sdf
>>>>>>> [osd.16]
>>>>>>>          host = store5
>>>>>>>          osd journal = /dev/sdg1
>>>>>>>          btrfs devs = /dev/sdc
>>>>>>> [osd.17]
>>>>>>>          host = store5
>>>>>>>          osd journal = /dev/sdh1
>>>>>>>          btrfs devs = /dev/sdd
>>>>>>> [osd.18]
>>>>>>>          host = store5
>>>>>>>          osd journal = /dev/sdi1
>>>>>>>          btrfs devs = /dev/sde
>>>>>>> [osd.19]
>>>>>>>          host = store5
>>>>>>>          osd journal = /dev/sdj1
>>>>>>>          btrfs devs = /dev/sdf
>>>>>>> [osd.20]
>>>>>>>          host = store6
>>>>>>>          osd journal = /dev/sdg1
>>>>>>>          btrfs devs = /dev/sdc
>>>>>>> [osd.21]
>>>>>>>          host = store6
>>>>>>>          osd journal = /dev/sdh1
>>>>>>>          btrfs devs = /dev/sdd
>>>>>>> [osd.22]
>>>>>>>          host = store6
>>>>>>>          osd journal = /dev/sdi1
>>>>>>>          btrfs devs = /dev/sde
>>>>>>> [osd.23]
>>>>>>>          host = store6
>>>>>>>          osd journal = /dev/sdj1
>>>>>>>          btrfs devs = /dev/sdf
>>>>>>>
>>>>>>>
>>>>>>> On 28.03.2013 19:01, Gregory Farnum wrote:
>>>>>>>>
>>>>>>>> Your crush map looks fine to me. I'm saying that your ceph -s output
>>>>>>>> showed the OSD still hadn't been marked out. No data will be
>>>>>>>> migrated
>>>>>>>> until it's marked out.
>>>>>>>> After ten minutes it should have been marked out, but that's based
>>>>>>>> on
>>>>>>>> a number of factors you have some control over. If you just want a
>>>>>>>> quick check of your crush map you can mark it out manually, too.
>>>>>>>> -Greg
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>
>>
>>
>

-- 
John Wilkins
Senior Technical Writer
Intank
john.wilkins@xxxxxxxxxxx
(415) 425-9599
http://inktank.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com