Re: cant get cluster to become healthy. "stale+undersized+degraded+peered"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

What are your iptable rules?
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Sep 17, 2015 at 1:01 AM, Stefan Eriksson  wrote:
> hi here is the info, I have added "ceph osd pool set rbd pg_num 128" but that locks up aswell it seems.
>
> Here are the details your after:
>
> [cephcluster@ceph01-adm01 ceph-deploy]$ ceph osd pool ls detail
> pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 64 last_change 37 flags hashpspool stripe_width 0
>
> [cephcluster@ceph01-adm01 ceph-deploy]$ ceph pg dump_stuck
> ok
> pg_stat state   up      up_primary      acting  acting_primary
> 0.2d    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.2c    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.2b    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.2a    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.29    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.28    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.27    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.26    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.25    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.24    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.23    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.22    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.21    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.20    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.1f    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.1e    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.1d    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.1c    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.1b    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.1a    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.19    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.18    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.17    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.16    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.15    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.14    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.13    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.12    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.11    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.10    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.f     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.e     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.d     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.c     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.b     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.a     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.9     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.8     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.7     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.6     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.5     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.4     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.3     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.2     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.1     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.0     stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.7f    creating        [0,2,1] 0       [0,2,1] 0
> 0.7e    creating        [2,0,1] 2       [2,0,1] 2
> 0.7d    creating        [0,2,1] 0       [0,2,1] 0
> 0.7c    creating        [1,0,2] 1       [1,0,2] 1
> 0.7b    creating        [0,2,1] 0       [0,2,1] 0
> 0.7a    creating        [0,2,1] 0       [0,2,1] 0
> 0.79    creating        [1,0,2] 1       [1,0,2] 1
> 0.78    creating        [1,0,2] 1       [1,0,2] 1
> 0.77    creating        [1,0,2] 1       [1,0,2] 1
> 0.76    creating        [1,2,0] 1       [1,2,0] 1
> 0.75    creating        [1,2,0] 1       [1,2,0] 1
> 0.74    creating        [1,2,0] 1       [1,2,0] 1
> 0.73    creating        [1,2,0] 1       [1,2,0] 1
> 0.72    creating        [0,2,1] 0       [0,2,1] 0
> 0.71    creating        [0,2,1] 0       [0,2,1] 0
> 0.70    creating        [2,0,1] 2       [2,0,1] 2
> 0.6f    creating        [2,1,0] 2       [2,1,0] 2
> 0.6e    creating        [0,1,2] 0       [0,1,2] 0
> 0.6d    creating        [1,2,0] 1       [1,2,0] 1
> 0.6c    creating        [2,0,1] 2       [2,0,1] 2
> 0.6b    creating        [1,2,0] 1       [1,2,0] 1
> 0.6a    creating        [2,1,0] 2       [2,1,0] 2
> 0.69    creating        [2,0,1] 2       [2,0,1] 2
> 0.68    creating        [0,1,2] 0       [0,1,2] 0
> 0.67    creating        [0,1,2] 0       [0,1,2] 0
> 0.66    creating        [0,1,2] 0       [0,1,2] 0
> 0.65    creating        [1,0,2] 1       [1,0,2] 1
> 0.64    creating        [2,0,1] 2       [2,0,1] 2
> 0.63    creating        [1,2,0] 1       [1,2,0] 1
> 0.62    creating        [2,1,0] 2       [2,1,0] 2
> 0.61    creating        [1,2,0] 1       [1,2,0] 1
> 0.60    creating        [1,0,2] 1       [1,0,2] 1
> 0.5f    creating        [2,0,1] 2       [2,0,1] 2
> 0.5e    creating        [1,0,2] 1       [1,0,2] 1
> 0.5d    creating        [1,0,2] 1       [1,0,2] 1
> 0.5c    creating        [1,2,0] 1       [1,2,0] 1
> 0.5b    creating        [1,2,0] 1       [1,2,0] 1
> 0.5a    creating        [1,0,2] 1       [1,0,2] 1
> 0.59    creating        [0,2,1] 0       [0,2,1] 0
> 0.58    creating        [2,0,1] 2       [2,0,1] 2
> 0.57    creating        [0,1,2] 0       [0,1,2] 0
> 0.56    creating        [2,1,0] 2       [2,1,0] 2
> 0.55    creating        [0,2,1] 0       [0,2,1] 0
> 0.54    creating        [0,2,1] 0       [0,2,1] 0
> 0.53    creating        [1,2,0] 1       [1,2,0] 1
> 0.52    creating        [1,2,0] 1       [1,2,0] 1
> 0.51    creating        [1,2,0] 1       [1,2,0] 1
> 0.50    creating        [0,2,1] 0       [0,2,1] 0
> 0.4f    creating        [0,2,1] 0       [0,2,1] 0
> 0.4e    creating        [0,1,2] 0       [0,1,2] 0
> 0.4d    creating        [2,1,0] 2       [2,1,0] 2
> 0.4c    creating        [1,2,0] 1       [1,2,0] 1
> 0.4b    creating        [0,1,2] 0       [0,1,2] 0
> 0.4a    creating        [2,1,0] 2       [2,1,0] 2
> 0.49    creating        [0,1,2] 0       [0,1,2] 0
> 0.48    creating        [1,2,0] 1       [1,2,0] 1
> 0.47    creating        [0,2,1] 0       [0,2,1] 0
> 0.46    creating        [0,2,1] 0       [0,2,1] 0
> 0.45    creating        [2,0,1] 2       [2,0,1] 2
> 0.44    creating        [1,2,0] 1       [1,2,0] 1
> 0.43    creating        [1,0,2] 1       [1,0,2] 1
> 0.42    creating        [1,0,2] 1       [1,0,2] 1
> 0.41    creating        [1,2,0] 1       [1,2,0] 1
> 0.40    creating        [0,1,2] 0       [0,1,2] 0
> 0.3f    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.3e    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.3d    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.3c    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.3b    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.3a    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.39    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.38    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.37    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.36    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.35    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.34    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.33    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.32    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.31    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.30    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.2f    stale+undersized+degraded+peered    [0] 0       [0]     0
> 0.2e    stale+undersized+degraded+peered    [0] 0       [0]     0
>
> let me know if you need anything else, I try to debug and see a lot of the below in my ceph-osd.log
>
>
> 015-09-17 08:57:08.723289 7f2b31eda700 20 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f2b31eda700' grace 60 suicide 0
> 2015-09-17 08:57:09.302722 7f2b1f549700 20 heartbeat_map reset_timeout 'OSD::recovery_tp thread 0x7f2b1f549700' grace 60 suicide 0
> 2015-09-17 08:57:09.508026 7f2b3aa73700 20 heartbeat_map is_healthy = healthy
> 2015-09-17 08:57:09.722922 7f2b33ede700 20 filestore(/var/lib/ceph/osd/ceph-0) sync_entry woke after 5.000183
> 2015-09-17 08:57:09.722956 7f2b33ede700 20 filestore(/var/lib/ceph/osd/ceph-0) sync_entry waiting for max_interval 5.000000
> 2015-09-17 08:57:09.722987 7f2b326db700 20 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f2b326db700' grace 60 suicide 0
> 2015-09-17 08:57:09.723080 7f2b31eda700 20 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f2b31eda700' grace 60 suicide 0
> 2015-09-17 08:57:09.793157 7f2b1fd4a700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b1fd4a700' grace 15 suicide 150
> 2015-09-17 08:57:09.793169 7f2b1fd4a700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b1fd4a700' grace 4 suicide 0
> 2015-09-17 08:57:09.801912 7f2b2154d700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b2154d700' grace 15 suicide 150
> 2015-09-17 08:57:09.801925 7f2b2154d700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b2154d700' grace 4 suicide 0
> 2015-09-17 08:57:09.828221 7f2b1e547700 20 heartbeat_map reset_timeout 'OSD::command_tp thread 0x7f2b1e547700' grace 60 suicide 0
> 2015-09-17 08:57:09.954625 7f2b24553700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b24553700' grace 15 suicide 150
> 2015-09-17 08:57:09.954646 7f2b24553700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b24553700' grace 4 suicide 0
> 2015-09-17 08:57:09.989839 7f2b21d4e700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b21d4e700' grace 15 suicide 150
> 2015-09-17 08:57:09.989852 7f2b21d4e700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b21d4e700' grace 4 suicide 0
>
>
>
>> 17 sep 2015 kl. 08:11 skrev Goncalo Borges :
>>
>> Hello Stefan...
>>
>> Those 64 PGs refer to the default rbd pool which is created. Can you please give us the output of
>>
>>    # ceph osd pool ls detail
>>    # ceph pg dump_stuck
>>
>> The degraded / stale status means that the PGs can not be replicated according to your policies.
>>
>> My guess is that you simply have too few OSDs for the number of replicas you are requesting
>>
>> Cheers
>> G.
>>
>>
>>
>> On 09/17/2015 02:59 AM, Stefan Eriksson wrote:
>>> I have a completely new cluster for testing and its three servers which all are monitors and hosts for OSD, they each have one disk.
>>> The issue is ceph status shows: 64 stale+undersized+degraded+peered
>>>
>>> health:
>>>
>>>     health HEALTH_WARN
>>>            clock skew detected on mon.ceph01-osd03
>>>            64 pgs degraded
>>>            64 pgs stale
>>>            64 pgs stuck degraded
>>>            64 pgs stuck inactive
>>>            64 pgs stuck stale
>>>            64 pgs stuck unclean
>>>            64 pgs stuck undersized
>>>            64 pgs undersized
>>>            too few PGs per OSD (21 < min 30)
>>>            Monitor clock skew detected
>>>     monmap e1: 3 mons at {ceph01-osd01=192.1.41.51:6789/0,ceph01-osd02=192.1.41.52:6789/0,ceph01-osd03=192.1.41.53:6789/0}
>>>            election epoch 82, quorum 0,1,2 ceph01-osd01,ceph01-osd02,ceph01-osd03
>>>     osdmap e36: 3 osds: 3 up, 3 in
>>>      pgmap v85: 64 pgs, 1 pools, 0 bytes data, 0 objects
>>>            101352 kB used, 8365 GB / 8365 GB avail
>>>                  64 stale+undersized+degraded+peered
>>>
>>>
>>> ceph osd tree shows:
>>> ID WEIGHT  TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>> -1 8.15996 root default
>>> -2 2.71999     host ceph01-osd01
>>> 0 2.71999         osd.0              up  1.00000          1.00000
>>> -3 2.71999     host ceph01-osd02
>>> 1 2.71999         osd.1              up  1.00000          1.00000
>>> -4 2.71999     host ceph01-osd03
>>> 2 2.71999         osd.2              up  1.00000          1.00000
>>>
>>>
>>>
>>>
>>>
>>> Here is my crushmap:
>>>
>>> # begin crush map
>>> tunable choose_local_tries 0
>>> tunable choose_local_fallback_tries 0
>>> tunable choose_total_tries 50
>>> tunable chooseleaf_descend_once 1
>>> tunable straw_calc_version 1
>>>
>>> # devices
>>> device 0 osd.0
>>> device 1 osd.1
>>> device 2 osd.2
>>>
>>> # types
>>> type 0 osd
>>> type 1 host
>>> type 2 chassis
>>> type 3 rack
>>> type 4 row
>>> type 5 pdu
>>> type 6 pod
>>> type 7 room
>>> type 8 datacenter
>>> type 9 region
>>> type 10 root
>>>
>>> # buckets
>>> host ceph01-osd01 {
>>>        id -2           # do not change unnecessarily
>>>        # weight 2.720
>>>        alg straw
>>>        hash 0  # rjenkins1
>>>        item osd.0 weight 2.720
>>> }
>>> host ceph01-osd02 {
>>>        id -3           # do not change unnecessarily
>>>        # weight 2.720
>>>        alg straw
>>>        hash 0  # rjenkins1
>>>        item osd.1 weight 2.720
>>> }
>>> host ceph01-osd03 {
>>>        id -4           # do not change unnecessarily
>>>        # weight 2.720
>>>        alg straw
>>>        hash 0  # rjenkins1
>>>        item osd.2 weight 2.720
>>> }
>>> root default {
>>>        id -1           # do not change unnecessarily
>>>        # weight 8.160
>>>        alg straw
>>>        hash 0  # rjenkins1
>>>        item ceph01-osd01 weight 2.720
>>>        item ceph01-osd02 weight 2.720
>>>        item ceph01-osd03 weight 2.720
>>> }
>>>
>>> # rules
>>> rule replicated_ruleset {
>>>        ruleset 0
>>>        type replicated
>>>        min_size 1
>>>        max_size 10
>>>        step take default
>>>        step chooseleaf firstn 0 type host
>>>        step emit
>>> }
>>>
>>> # end crush map
>>>
>>> And the ceph.conf which is shared among all nodes:
>>>
>>> ceph.conf
>>> [global]
>>> fsid = b9043917-5f65-98d5-8624-ee12ff32a5ea
>>> public_network = 192.1.41.0/24
>>> cluster_network = 192.168.0.0/24
>>> mon_initial_members = ceph01-osd01, ceph01-osd02, ceph01-osd03
>>> mon_host = 192.1.41.51,192.1.41.52,192.1.41.53
>>> auth_cluster_required = cephx
>>> auth_service_required = cephx
>>> auth_client_required = cephx
>>> filestore_xattr_use_omap = true
>>> osd pool default pg num = 512
>>> osd pool default pgp num = 512
>>>
>>> Logs doesnt say much, the only active log which adds something is:
>>>
>>> mon.ceph01-osd01@0(leader).data_health(82) update_stats avail 88% total 9990 MB, used 1170 MB, avail 8819 MB
>>> mon.ceph01-osd02@1(peon).data_health(82) update_stats avail 88% total 9990 MB, used 1171 MB, avail 8818 MB
>>> mon.ceph01-osd03@2(peon).data_health(82) update_stats avail 88% total 9990 MB, used 1172 MB, avail 8817 MB
>>>
>>> Does anyone have a thoughts of what might be wrong? Or if there is other info I can provide to ease the search for what it might be?
>>>
>>> Thanks!
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> --
>> Goncalo Borges
>> Research Computing
>> ARC Centre of Excellence for Particle Physics at the Terascale
>> School of Physics A28 | University of Sydney, NSW  2006
>> T: +61 2 93511937
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.0.2
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJV+ttICRDmVDuy+mK58QAAIcUQAJlV3a9tehS4nRil+31Y
YBClyPT51LEVEflSfE4Vpv7EDEjm0cvbJi+zFAOMpUpnj3+2Qu24NLcuZ02k
HN9n3glETNN2Mazp5F0UZEXqv8e8lYRe5Dg+/IC1lktoFotoYmpuMwBLKVpr
sBeg12t/9v0WZmcY5nWvymzKC7TJQTTJ2+TTOSJ+1sWwSo9coUEyf6mjmTzi
KMi0g7NYiqHS00xskOQOuMK2acgbAdAWr2fRDQjPZq709MWCAQXq9mVu2z+k
fwXapDUct00ljcGHBBgUxXNVK8Rsbz4wK4BC1+C42EuiofPWQWrq06s+hAPa
Fi/0NtYuiqwhzAacKJ2ChgBeZIv11t1i8k3BgohFgX2Kd/1/joFrMNkSCjJ6
VbamCkKvqTBsj/TdS8A9XMd8FesGpTiICNxzJnwByWfasAUANbjdzu40xo/D
ZoPEv1t0odxYhXGvfLTUFSw0AmqFMG4xaRuu2O/9b8cQG/QPi83Dadg8gkLF
U2mrEMmCgFU5D5HmqUjAnY1hMZi6QKlcyM5Ym2frCezYyV9CBaVymFqtYxPB
kSjYjOdyoTKNLm9Y0j+uyPkqrNAnMLgySprxy4vEeJhst16e/pgITx4OMXMn
DgfNTGpCkUQdVJWFzk5brTdb7mNmomC++bUowXCCd57kR60aKhbsX3HuklVk
MvlQ
=EGoq
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux