Re: problem with degraded PG

huang jun <hjwsm1989@xxxxxxxxx> · Sun, 16 Jun 2019 13:02:46 +0800

can you show us the output of 'ceph osd dump' and 'ceph health detail'?

Luk <skidoo@xxxxxxx> 于2019年6月14日周五 下午8:02写道：
>
> Hello,
>
> All kudos are going to friends from Wroclaw, PL :)
>
> It was as simple as typo...
>
> There  was osd added two times to crushmap due to (this commands where
> run  over  week  ago  -  didn't  have problem then, it showed up after
> replacing another osd - osd-7):
>
> ceph osd crush add osd.112 0.00 root=hdd
> ceph osd crush move osd.112 0.00 root=hdd rack=rack-a host=stor-a02
> ceph osd crush add osd.112 0.00 host=stor-a02
>
> and the ceph osd tree was like this:
> [root@ceph-mon-01 ~]# ceph osd tree
> ID   CLASS WEIGHT    TYPE NAME                 STATUS REWEIGHT PRI-AFF
> -100       200.27496 root hdd
> -101        67.64999     rack rack-a
>   -2        33.82500         host stor-a01
>    0   hdd   7.27499             osd.0             up  1.00000 1.00000
>    6   hdd   7.27499             osd.6             up  1.00000 1.00000
>   12   hdd   7.27499             osd.12            up  1.00000 1.00000
>  108   hdd   4.00000             osd.108           up  1.00000 1.00000
>  109   hdd   4.00000             osd.109           up  1.00000 1.00000
>  110   hdd   4.00000             osd.110           up  1.00000 1.00000
>   -7        33.82500         host stor-a02
>    5   hdd   7.27499             osd.5             up  1.00000 1.00000
>    9   hdd   7.27499             osd.9             up  1.00000 1.00000
>   15   hdd   7.27499             osd.15            up  1.00000 1.00000
>  111   hdd   4.00000             osd.111           up  1.00000 1.00000
>  112   hdd   4.00000             osd.112           up  1.00000 1.00000
>  113   hdd   4.00000             osd.113           up  1.00000 1.00000
> -102        60.97498     rack rack-b
>   -3        27.14998         host stor-b01
>    1   hdd   7.27499             osd.1             up  1.00000 1.00000
>    7   hdd   0.59999             osd.7             up  1.00000 1.00000
>   13   hdd   7.27499             osd.13            up  1.00000 1.00000
>  114   hdd   4.00000             osd.114           up  1.00000 1.00000
>  115   hdd   4.00000             osd.115           up  1.00000 1.00000
>  116   hdd   4.00000             osd.116           up  1.00000 1.00000
>   -4        33.82500         host stor-b02
>    2   hdd   7.27499             osd.2             up  1.00000 1.00000
>   10   hdd   7.27499             osd.10            up  1.00000 1.00000
>   16   hdd   7.27499             osd.16            up  1.00000 1.00000
>  117   hdd   4.00000             osd.117           up  1.00000 1.00000
>  118   hdd   4.00000             osd.118           up  1.00000 1.00000
>  119   hdd   4.00000             osd.119           up  1.00000 1.00000
> -103        67.64999     rack rack-c
>   -6        33.82500         host stor-c01
>    4   hdd   7.27499             osd.4             up  1.00000 1.00000
>    8   hdd   7.27499             osd.8             up  1.00000 1.00000
>   14   hdd   7.27499             osd.14            up  1.00000 1.00000
>  120   hdd   4.00000             osd.120           up  1.00000 1.00000
>  121   hdd   4.00000             osd.121           up  1.00000 1.00000
>  122   hdd   4.00000             osd.122           up  1.00000 1.00000
>   -5        33.82500         host stor-c02
>    3   hdd   7.27499             osd.3             up  1.00000 1.00000
>   11   hdd   7.27499             osd.11            up  1.00000 1.00000
>   17   hdd   7.27499             osd.17            up  1.00000 1.00000
>  123   hdd   4.00000             osd.123           up  1.00000 1.00000
>  124   hdd   4.00000             osd.124           up  1.00000 1.00000
>  125   hdd   4.00000             osd.125           up  1.00000 1.00000
>  112   hdd   4.00000     osd.112                   up  1.00000 1.00000
>
>  [cut]
>
>  after  editing  crushmap  and removing osd.112 from root ceph started
>  recover and is healthy now :)
>
>  Regards
>  Lukasz
>
>
> > Here is ceph osd tree, in first post there is also ceph osd df tree:
>
> > https://pastebin.com/Vs75gpwZ
>
>
>
> >> Ahh I was thinking of chooseleaf_vary_r, which you already have.
> >> So probably not related to tunables. What is your `ceph osd tree` ?
>
> >> By the way, 12.2.9 has an unrelated bug (details
> >> http://tracker.ceph.com/issues/36686)
> >> AFAIU you will just need to update to v12.2.11 or v12.2.12 for that fix.
>
> >> -- Dan
>
> >> On Fri, Jun 14, 2019 at 11:29 AM Luk <skidoo@xxxxxxx> wrote:
> >>>
> >>> Hi,
> >>>
> >>> here is the output:
> >>>
> >>> ceph osd crush show-tunables
> >>> {
> >>>     "choose_local_tries": 0,
> >>>     "choose_local_fallback_tries": 0,
> >>>     "choose_total_tries": 100,
> >>>     "chooseleaf_descend_once": 1,
> >>>     "chooseleaf_vary_r": 1,
> >>>     "chooseleaf_stable": 0,
> >>>     "straw_calc_version": 1,
> >>>     "allowed_bucket_algs": 22,
> >>>     "profile": "unknown",
> >>>     "optimal_tunables": 0,
> >>>     "legacy_tunables": 0,
> >>>     "minimum_required_version": "hammer",
> >>>     "require_feature_tunables": 1,
> >>>     "require_feature_tunables2": 1,
> >>>     "has_v2_rules": 0,
> >>>     "require_feature_tunables3": 1,
> >>>     "has_v3_rules": 0,
> >>>     "has_v4_buckets": 1,
> >>>     "require_feature_tunables5": 0,
> >>>     "has_v5_rules": 0
> >>> }
> >>>
> >>> [root@ceph-mon-01 ~]#
> >>>
> >>> --
> >>> Regards
> >>> Lukasz
> >>>
> >>> > Hi,
> >>> > This looks like a tunables issue.
> >>> > What is the output of `ceph osd crush show-tunables `
> >>>
> >>> > -- Dan
> >>>
> >>> > On Fri, Jun 14, 2019 at 11:19 AM Luk <skidoo@xxxxxxx> wrote:
> >>> >>
> >>> >> Hello,
> >>> >>
> >>> >> Maybe  somone  was  fighting  with this kind of stuck in ceph already.
> >>> >> This  is  production  cluster,  can't/don't  want to make wrong steps,
> >>> >> please advice, what to do.
> >>> >>
> >>> >> After  changing  of  one failed disk (it was osd-7) on our cluster ceph
> >>> >> didn't recover to HEALTH_OK, it stopped in state:
> >>> >>
> >>> >> [root@ceph-mon-01 ~]# ceph -s
> >>> >>   cluster:
> >>> >>     id:     b6f23cff-7279-f4b0-ff91-21fadac95bb5
> >>> >>     health: HEALTH_WARN
> >>> >>             noout,noscrub,nodeep-scrub flag(s) set
> >>> >>             Degraded data redundancy: 24761/45994899 objects degraded (0.054%), 8 pgs degraded, 8 pgs undersized
> >>> >>
> >>> >>   services:
> >>> >>     mon:        3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
> >>> >>     mgr:        ceph-mon-03(active), standbys: ceph-mon-02, ceph-mon-01
> >>> >>     osd:        144 osds: 144 up, 144 in
> >>> >>                 flags noout,noscrub,nodeep-scrub
> >>> >>     rbd-mirror: 3 daemons active
> >>> >>     rgw:        6 daemons active
> >>> >>
> >>> >>   data:
> >>> >>     pools:   18 pools, 2176 pgs
> >>> >>     objects: 15.33M objects, 49.3TiB
> >>> >>     usage:   151TiB used, 252TiB / 403TiB avail
> >>> >>     pgs:     24761/45994899 objects degraded (0.054%)
> >>> >>              2168 active+clean
> >>> >>              8    active+undersized+degraded
> >>> >>
> >>> >>   io:
> >>> >>     client:   435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
> >>> >>
> >>> >> Restart of OSD didn't helped, changing choose_total_tries from 50 to 100 didn't help.
> >>> >>
> >>> >> I checked one of degraded PG, 10.3c4
> >>> >>
> >>> >> [root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
> >>> >> 10.3c4     3593                  0     3593         0       0 14769891858 10076    10076    active+undersized+degraded 2019-06-13 08:19:39.802219  37380'71900564 37380:119411139       [9,109]          9       [9,109]              9  33550'69130424 2019-06-08 02:28:40.508790  33550'69130424 2019-06-08 02:28:40.508790            18
> >>> >>
> >>> >>
> >>> >> [root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] | {peer: .peer, last_update:.last_update}'
> >>> >> {
> >>> >>   "peer": "0",
> >>> >>   "last_update": "36847'71412720"
> >>> >> }
> >>> >> {
> >>> >>   "peer": "109",
> >>> >>   "last_update": "37380'71900570"
> >>> >> }
> >>> >> {
> >>> >>   "peer": "117",
> >>> >>   "last_update": "0'0"
> >>> >> }
> >>> >>
> >>> >>
> >>> >> [root@ceph-mon-01 ~]#
> >>> >> I have checked space taken for this PG on storage nodes:
> >>> >> here is how to check where is particular OSD (on which physical storage node):
> >>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
> >>> >> |  9  |   stor-a02  | 2063G | 5386G |   52   |  1347k  |   53   |   292k  | exists,up |
> >>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
> >>> >> | 109 |   stor-a01  | 1285G | 4301G |    5   |  31.0k  |    6   |  59.2k  | exists,up |
> >>> >> [root@ceph-mon-01 ~]# watch ceph -s
> >>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
> >>> >> | 117 |   stor-b02  | 1334G | 4252G |   54   |  1216k  |   13   |  27.4k  | exists,up |
> >>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
> >>> >> |  0  |   stor-a01  | 2156G | 5293G |   58   |   387k  |   29   |  30.7k  | exists,up |
> >>> >> [root@ceph-mon-01 ~]#
> >>> >> and checking sizes on servers:
> >>> >> stor-a01 (this PG shouldn't be on the same host):
> >>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
> >>> >> 2.4G    10.3c4_head
> >>> >> 0       10.3c4_TEMP
> >>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
> >>> >> 14G     10.3c4_head
> >>> >> 0       10.3c4_TEMP
> >>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
> >>> >> stor-a02:
> >>> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
> >>> >> 14G     10.3c4_head
> >>> >> 0       10.3c4_TEMP
> >>> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
> >>> >> stor-b02:
> >>> >> [root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
> >>> >> zsh: no matches found: 10.3c4_*
> >>> >>
> >>> >> information about ceph:
> >>> >> [root@ceph-mon-01 ~]# ceph versions
> >>> >> {
> >>> >>     "mon": {
> >>> >>         "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >>> >>     },
> >>> >>     "mgr": {
> >>> >>         "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >>> >>     },
> >>> >>     "osd": {
> >>> >>         "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144
> >>> >>     },
> >>> >>     "mds": {},
> >>> >>     "rbd-mirror": {
> >>> >>         "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >>> >>     },
> >>> >>     "rgw": {
> >>> >>         "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6
> >>> >>     },
> >>> >>     "overall": {
> >>> >>         "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159
> >>> >>     }
> >>> >> }
> >>> >>
> >>> >> crushmap: https://pastebin.com/cpC2WmyS
> >>> >> ceph osd tree: https://pastebin.com/XvZ2cNZZ
> >>> >>
> >>> >> I'm  cross-posting this do devel because maybe there is some known bug
> >>> >> in  this  particular  version  of  ceph,  and  You  could  point  some
> >>> >> directions to fix this problem.
> >>> >>
> >>> >> --
> >>> >> Regards
> >>> >> Lukasz
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Pozdrowienia,
> >>>  Luk
> >>>
>
>
>
>
>
>
> --
> Pozdrowienia,
>  Luk
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com