Ceph Not getting into a clean state

mark.kirkwood@xxxxxxxxxxxxxxx (Mark Kirkwood) · Fri, 09 May 2014 19:49:55 +1200

Right,

I've run into the situation where the system seems reluctant to 
reorganise after changing all the pool sizes - until the osds are 
restarted (essentially I just rebooted each host in turn) *then* the 
health went to OK. This was a while ago (pre 0.72), so something else 
may be going on with current versions - but maybe try restarting 'em.

Regards

Mark

On 09/05/14 18:53, Georg H?llrigl wrote:
> Hello,
>
> I've already thought about that - but even after changing the 
> replication level (size) I'm not getting a clean cluster (there are 
> only the default pools ATM):
>
> root at ceph-m-02:~#ceph -s
>     cluster b04fc583-9e71-48b7-a741-92f4dff4cfef
>      health HEALTH_WARN 232 pgs stuck unclean; recovery 26/126 objects 
> degraded (20.635%)
>      monmap e2: 3 mons at 
> {ceph-m-01=10.0.0.100:6789/0,ceph-m-02=10.0.1.101:6789/0,ceph-m-03=10.0.1.102:6789/0}, 
> election epoch 8, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03
>      osdmap e56: 9 osds: 9 up, 9 in
>       pgmap v287: 232 pgs, 8 pools, 822 bytes data, 43 objects
>             9342 MB used, 78317 GB / 78326 GB avail
>             26/126 objects degraded (20.635%)
>                  119 active
>                  113 active+remapped
> root at ceph-m-02:~#ceph osd dump | grep size
> pool 0 'data' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
> rjenkins pg_num 64 pgp_num 64 last_change 48 owner 0 flags hashpspool 
> crash_replay_interval 45 stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 64 pgp_num 64 last_change 49 owner 0 flags 
> hashpspool stripe_width 0
> pool 2 'rbd' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
> rjenkins pg_num 64 pgp_num 64 last_change 50 owner 0 flags hashpspool 
> stripe_width 0
> pool 3 '.rgw.root' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 52 owner 0 flags 
> hashpspool stripe_width 0
> pool 4 '.rgw.control' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 53 owner 0 flags 
> hashpspool stripe_width 0
> pool 5 '.rgw' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
> rjenkins pg_num 8 pgp_num 8 last_change 54 owner 18446744073709551615 
> flags hashpspool stripe_width 0
> pool 6 '.rgw.gc' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 55 owner 0 flags 
> hashpspool stripe_width 0
> pool 7 '.users.uid' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 56 owner 
> 18446744073709551615 flags hashpspool stripe_width 0
>
>
> Kind Regards,
> Georg
>
>
> On 09.05.2014 08:29, Mark Kirkwood wrote:
>> So that's two hosts - if this is a new cluster chances are the pools
>> have replication size=3, and won't place replica pgs on the same host...
>> 'ceph osd dump' will let you know if this is the case. If it is ether
>> reduce size to 2, add another host or edit your crush rules to allow
>> replica pgs on the same host.
>>
>> Cheers
>>
>> Mark
>>
>> On 09/05/14 18:20, Georg H?llrigl wrote:
>>> #ceph osd tree
>>> # id    weight  type name       up/down reweight
>>> -1      76.47   root default
>>> -2      32.72           host ceph-s-01
>>> 0       7.27                    osd.0   up      1
>>> 1       7.27                    osd.1   up      1
>>> 2       9.09                    osd.2   up      1
>>> 3       9.09                    osd.3   up      1
>>> -3      43.75           host ceph-s-02
>>> 4       10.91                   osd.4   up      1
>>> 5       0.11                    osd.5   up      1
>>> 6       10.91                   osd.6   up      1
>>> 7       10.91                   osd.7   up      1
>>> 8       10.91                   osd.8   up      1
>>>
>>>
>>> On 08.05.2014 19:11, Craig Lewis wrote:
>>>> What does `ceph osd tree` output?
>>>>
>>>> On 5/8/14 07:30 , Georg H?llrigl wrote:
>>>>> Hello,
>>>>>
>>>>> We've a fresh cluster setup - with Ubuntu 14.04 and ceph firefly. By
>>>>> now I've tried this multiple times - but the result keeps the same 
>>>>> and
>>>>> shows me lots of troubles (the cluster is empty, no client has
>>>>> accessed it)
>>>>>
>>>>> #ceph -s
>>>>>     cluster b04fc583-9e71-48b7-a741-92f4dff4cfef
>>>>>      health HEALTH_WARN 470 pgs stale; 470 pgs stuck stale; 18 pgs
>>>>> stuck unclean; 26 requests are blocked > 32 sec
>>>>>      monmap e2: 3 mons at
>>>>> {ceph-m-01=10.0.0.100:6789/0,ceph-m-02=10.0.1.101:6789/0,ceph-m-03=10.0.1.102:6789/0}, 
>>>>>
>>>>>
>>>>> election epoch 8, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03
>>>>>      osdmap e409: 9 osds: 9 up, 9 in
>>>>>       pgmap v1231: 480 pgs, 9 pools, 822 bytes data, 43 objects
>>>>>             9373 MB used, 78317 GB / 78326 GB avail
>>>>>                  451 stale+active+clean
>>>>>                    1 stale+active+clean+scrubbing
>>>>>                   10 active+clean
>>>>>                   18 stale+active+remapped
>>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com