Re: Cluster Map Problems

Dewan Shamsul Alam <dewan.shamsul@xxxxxxxxx> · Wed, 3 Apr 2013 20:37:21 +0600

Hi,

I've seen this in 0.56. In my case I shutdown one server then bring it back. I have to run /etc/init.d/ceph -a restart to make it healthy.  It doesn't impact the running VM I have in that cluster though.

On Wed, Apr 3, 2013 at 8:32 PM, Martin Mailand <martin@xxxxxxxxxxxx> wrote:

Hi,

I still have this problem in v0.60.

If I stop one OSD, the OSD get set down after 20 seconds. But after 300

seconds the OSD get not set out, there for the ceph stays degraded for ever.

I can reproduce it with a fresh created cluster.

root@store1:~# ceph -s

   health HEALTH_WARN 405 pgs degraded; 405 pgs stuck unclean; recovery

10603/259576 degraded (4.085%); 1/24 in osds are down

   monmap e1: 3 mons at

{a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},

election epoch 10, quorum 0,1,2 a,b,c

   osdmap e150: 24 osds: 23 up, 24 in

    pgmap v12028: 4800 pgs: 4395 active+clean, 405 active+degraded; 505

GB data, 1017 GB used, 173 TB / 174 TB avail; 0B/s rd, 6303B/s wr,

2op/s; 10603/259576 degraded (4.085%)

   mdsmap e1: 0/0/1 up

-martin

On 28.03.2013 23:45, John Wilkins wrote:

> Martin,

>

> I'm just speculating: since I just rewrote the networking section and

> there is an empty mon_host value, and I do recall a chat last week

> where mon_host was considered a different setting now, maybe you might

> try specifying:

>

> [mon.a]

>         mon host = store1

>         mon addr = 192.168.195.31:6789

>

> etc. for monitors. I'm assuming that's not the case, but I want to

> make sure my docs are right on this point.

>

>

> On Thu, Mar 28, 2013 at 3:24 PM, Martin Mailand <martin@xxxxxxxxxxxx> wrote:

>> Hi John,

>>

>> my ceph.conf is a bit further down in this email.

>>

>> -martin

>>

>> Am 28.03.2013 23:21, schrieb John Wilkins:

>>

>>> Martin,

>>>

>>> Would you mind posting your Ceph configuration file too?  I don't see

>>> any value set for "mon_host": ""

>>>

>>> On Thu, Mar 28, 2013 at 1:04 PM, Martin Mailand <martin@xxxxxxxxxxxx>

>>> wrote:

>>>>

>>>> Hi Greg,

>>>>

>>>> the dump from mon.a is attached.

>>>>

>>>> -martin

>>>>

>>>> On 28.03.2013 20:55, Gregory Farnum wrote:

>>>>>

>>>>> Hmm. The monitor code for checking this all looks good to me. Can you

>>>>> go to one of your monitor nodes and dump the config?

>>>>>

>>>>> (http://ceph.com/docs/master/rados/configuration/ceph-conf/?highlight=admin%20socket#viewing-a-configuration-at-runtime)

>>>>> -Greg

>>>>>

>>>>> On Thu, Mar 28, 2013 at 12:33 PM, Martin Mailand <martin@xxxxxxxxxxxx>

>>>>> wrote:

>>>>>>

>>>>>> Hi,

>>>>>>

>>>>>> I get the same behavior an new created cluster as well, no changes to

>>>>>> the cluster config at all.

>>>>>> I stop the osd.1, after 20 seconds it got marked down. But it never get

>>>>>> marked out.

>>>>>>

>>>>>> ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759)

>>>>>>

>>>>>> -martin

>>>>>>

>>>>>> On 28.03.2013 19:48, John Wilkins wrote:

>>>>>>>

>>>>>>> Martin,

>>>>>>>

>>>>>>> Greg is talking about noout. With Ceph, you can specifically preclude

>>>>>>> OSDs from being marked out when down to prevent rebalancing--e.g.,

>>>>>>> during upgrades, short-term maintenance, etc.

>>>>>>>

>>>>>>>

>>>>>>> http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#stopping-w-out-rebalancing

>>>>>>>

>>>>>>> On Thu, Mar 28, 2013 at 11:12 AM, Martin Mailand <martin@xxxxxxxxxxxx>

>>>>>>> wrote:

>>>>>>>>

>>>>>>>> Hi Greg,

>>>>>>>>

>>>>>>>> setting the osd manually out triggered the recovery.

>>>>>>>> But now it is the question, why is the osd not marked out after 300

>>>>>>>> seconds? That's a default cluster, I use the 0.59 build from your

>>>>>>>> site.

>>>>>>>> And I didn't change any value, except for the crushmap.

>>>>>>>>

>>>>>>>> That's my ceph.conf.

>>>>>>>>

>>>>>>>> -martin

>>>>>>>>

>>>>>>>> [global]

>>>>>>>>          auth cluster requierd = none

>>>>>>>>          auth service required = none

>>>>>>>>          auth client required = none

>>>>>>>> #       log file = ""

>>>>>>>>          log_max_recent=100

>>>>>>>>          log_max_new=100

>>>>>>>>

>>>>>>>> [mon]

>>>>>>>>          mon data = "">
>>>>>>>> [mon.a]

>>>>>>>>          host = store1

>>>>>>>>          mon addr = 192.168.195.31:6789

>>>>>>>> [mon.b]

>>>>>>>>          host = store3

>>>>>>>>          mon addr = 192.168.195.33:6789

>>>>>>>> [mon.c]

>>>>>>>>          host = store5

>>>>>>>>          mon addr = 192.168.195.35:6789

>>>>>>>> [osd]

>>>>>>>>          journal aio = true

>>>>>>>>          osd data = "">
>>>>>>>>          osd mount options btrfs = rw,noatime,nodiratime,autodefrag

>>>>>>>>          osd mkfs options btrfs = -n 32k -l 32k

>>>>>>>>

>>>>>>>> [osd.0]

>>>>>>>>          host = store1

>>>>>>>>          osd journal = /dev/sdg1

>>>>>>>>          btrfs devs = /dev/sdc

>>>>>>>> [osd.1]

>>>>>>>>          host = store1

>>>>>>>>          osd journal = /dev/sdh1

>>>>>>>>          btrfs devs = /dev/sdd

>>>>>>>> [osd.2]

>>>>>>>>          host = store1

>>>>>>>>          osd journal = /dev/sdi1

>>>>>>>>          btrfs devs = /dev/sde

>>>>>>>> [osd.3]

>>>>>>>>          host = store1

>>>>>>>>          osd journal = /dev/sdj1

>>>>>>>>          btrfs devs = /dev/sdf

>>>>>>>> [osd.4]

>>>>>>>>          host = store2

>>>>>>>>          osd journal = /dev/sdg1

>>>>>>>>          btrfs devs = /dev/sdc

>>>>>>>> [osd.5]

>>>>>>>>          host = store2

>>>>>>>>          osd journal = /dev/sdh1

>>>>>>>>          btrfs devs = /dev/sdd

>>>>>>>> [osd.6]

>>>>>>>>          host = store2

>>>>>>>>          osd journal = /dev/sdi1

>>>>>>>>          btrfs devs = /dev/sde

>>>>>>>> [osd.7]

>>>>>>>>          host = store2

>>>>>>>>          osd journal = /dev/sdj1

>>>>>>>>          btrfs devs = /dev/sdf

>>>>>>>> [osd.8]

>>>>>>>>          host = store3

>>>>>>>>          osd journal = /dev/sdg1

>>>>>>>>          btrfs devs = /dev/sdc

>>>>>>>> [osd.9]

>>>>>>>>          host = store3

>>>>>>>>          osd journal = /dev/sdh1

>>>>>>>>          btrfs devs = /dev/sdd

>>>>>>>> [osd.10]

>>>>>>>>          host = store3

>>>>>>>>          osd journal = /dev/sdi1

>>>>>>>>          btrfs devs = /dev/sde

>>>>>>>> [osd.11]

>>>>>>>>          host = store3

>>>>>>>>          osd journal = /dev/sdj1

>>>>>>>>          btrfs devs = /dev/sdf

>>>>>>>> [osd.12]

>>>>>>>>          host = store4

>>>>>>>>          osd journal = /dev/sdg1

>>>>>>>>          btrfs devs = /dev/sdc

>>>>>>>> [osd.13]

>>>>>>>>          host = store4

>>>>>>>>          osd journal = /dev/sdh1

>>>>>>>>          btrfs devs = /dev/sdd

>>>>>>>> [osd.14]

>>>>>>>>          host = store4

>>>>>>>>          osd journal = /dev/sdi1

>>>>>>>>          btrfs devs = /dev/sde

>>>>>>>> [osd.15]

>>>>>>>>          host = store4

>>>>>>>>          osd journal = /dev/sdj1

>>>>>>>>          btrfs devs = /dev/sdf

>>>>>>>> [osd.16]

>>>>>>>>          host = store5

>>>>>>>>          osd journal = /dev/sdg1

>>>>>>>>          btrfs devs = /dev/sdc

>>>>>>>> [osd.17]

>>>>>>>>          host = store5

>>>>>>>>          osd journal = /dev/sdh1

>>>>>>>>          btrfs devs = /dev/sdd

>>>>>>>> [osd.18]

>>>>>>>>          host = store5

>>>>>>>>          osd journal = /dev/sdi1

>>>>>>>>          btrfs devs = /dev/sde

>>>>>>>> [osd.19]

>>>>>>>>          host = store5

>>>>>>>>          osd journal = /dev/sdj1

>>>>>>>>          btrfs devs = /dev/sdf

>>>>>>>> [osd.20]

>>>>>>>>          host = store6

>>>>>>>>          osd journal = /dev/sdg1

>>>>>>>>          btrfs devs = /dev/sdc

>>>>>>>> [osd.21]

>>>>>>>>          host = store6

>>>>>>>>          osd journal = /dev/sdh1

>>>>>>>>          btrfs devs = /dev/sdd

>>>>>>>> [osd.22]

>>>>>>>>          host = store6

>>>>>>>>          osd journal = /dev/sdi1

>>>>>>>>          btrfs devs = /dev/sde

>>>>>>>> [osd.23]

>>>>>>>>          host = store6

>>>>>>>>          osd journal = /dev/sdj1

>>>>>>>>          btrfs devs = /dev/sdf

>>>>>>>>

>>>>>>>>

>>>>>>>> On 28.03.2013 19:01, Gregory Farnum wrote:

>>>>>>>>>

>>>>>>>>> Your crush map looks fine to me. I'm saying that your ceph -s output

>>>>>>>>> showed the OSD still hadn't been marked out. No data will be

>>>>>>>>> migrated

>>>>>>>>> until it's marked out.

>>>>>>>>> After ten minutes it should have been marked out, but that's based

>>>>>>>>> on

>>>>>>>>> a number of factors you have some control over. If you just want a

>>>>>>>>> quick check of your crush map you can mark it out manually, too.

>>>>>>>>> -Greg

>>>>>>>>

>>>>>>>> _______________________________________________

>>>>>>>> ceph-users mailing list

>>>>>>>> ceph-users@xxxxxxxxxxxxxx

>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>>>>>

>>>>>>>

>>>>>>>

>>>>>>>

>>>

>>>

>>>

>>

>

>

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com