Re: Huge rebalance after rebooting OSD host (Mimic)

kas <kas@xxxxxxxxxx> · Wed, 15 May 2019 15:55:14 +0200

	Marc,

Marc Roos wrote:
: Are you sure your osd's are up and reachable? (run ceph osd tree on 
: another node)

	They are up, because all three mons see them as up.
However, ceph osd tree provided the hint (thanks!): The OSD host went back
with hostname "localhost" instead of the correct one for some reason.
So the OSDs moved themselves to a new HOST=localhost CRUSH node directly
under the CRUSH root. I rebooted the OSD host once again, and it went up
again with the correct hostname, and the "ceph osd tree" output looks sane
now. So I guess we have a reason for such a huge rebalance.

	However, even though the OSD tree is back in the normal state,
the rebalance is still going on, and there are even inactive PGs,
with some Ceph clients being stuck seemingly forever:

    health: HEALTH_ERR
            1964645/3977451 objects misplaced (49.395%)
            Reduced data availability: 11 pgs inactive
            Degraded data redundancy: 315678/3977451 objects degraded (7.937%),
542 pgs degraded, 546 pgs undersized
            Degraded data redundancy (low space): 76 pgs backfill_toofull

  services:
    mon: 3 daemons, quorum stratus1,stratus2,stratus3
    mgr: stratus3(active), standbys: stratus1, stratus2
    osd: 44 osds: 44 up, 44 in; 1806 remapped pgs
    rgw: 1 daemon active

  data:
    pools:   9 pools, 3360 pgs
    objects: 1.33 M objects, 5.0 TiB
    usage:   25 TiB used, 465 TiB / 490 TiB avail
    pgs:     0.327% pgs not active
             315678/3977451 objects degraded (7.937%)
             1964645/3977451 objects misplaced (49.395%)
             1554 active+clean
             1226 active+remapped+backfill_wait
             482  active+undersized+degraded+remapped+backfill_wait
             51   active+undersized+degraded+remapped+backfill_wait+backfill_too
full
             25   active+remapped+backfill_wait+backfill_toofull
             6    activating+remapped
             5    active+undersized+remapped+backfill_wait
             4    activating+undersized+degraded+remapped
             4    active+undersized+degraded+remapped+backfilling
             2    active+remapped+backfilling
             1    activating+degraded+remapped

  io:
    client:   0 B/s rd, 126 KiB/s wr, 0 op/s rd, 5 op/s wr
    recovery: 52 MiB/s, 13 objects/s

# ceph pg ls|grep activating
23.298     622      622         0       0 2591064064 3041                            activating+undersized+degraded+remapped 2019-05-15 15:03:04.626434  102870'1371081   103721:1369041   [8,20,70]p8      [8,20]p8 2019-05-15 02:10:34.972050 2019-05-15 02:10:34.972050 
23.2cb     695      695       695       0 2885144354 3097                            activating+undersized+degraded+remapped 2019-05-15 15:03:04.592438   102890'828931   103721:1594128   [0,70,78]p0    [21,78]p21 2019-05-15 10:23:02.789435 2019-05-14 00:46:19.161050 
23.346     623        1      1245       0 2602515968 3076                                       activating+degraded+remapped 2019-05-15 14:56:05.317986  103083'1061153   103721:3719154 [78,79,26]p78  [26,23,5]p26 2019-05-15 10:21:17.388467 2019-05-15 10:21:17.388467 
23.436     664        0       664       0 2767360000 3079                                                activating+remapped 2019-05-15 15:05:19.349660   103083'987000   103721:1525097 [13,70,19]p13 [13,19,18]p13 2019-05-14 09:43:52.924297 2019-05-08 04:24:41.251620 
23.454     696        0      1846       0 2872765970 3031                                                activating+remapped 2019-05-15 15:05:19.152343  102896'1092297   103721:1607448   [2,69,70]p2 [24,12,75]p24 2019-05-15 14:06:45.123388 2019-05-11 21:53:50.183932 
23.490     636        0       636       0 2635874322 3064                                                activating+remapped 2019-05-15 15:05:19.368037  103083'4996760   103721:1789524  [13,70,1]p13  [13,1,24]p13 2019-05-14 05:16:51.180417 2019-05-09 04:51:52.645295 
23.4f5     633        0      1266       0 2641321984 3084                                                activating+remapped 2019-05-15 14:56:04.248887  103035'4667973   103721:2116544 [70,72,27]p70 [25,27,79]p25 2019-05-15 01:07:28.978979 2019-05-08 07:20:08.253942 
23.76b     596        0      1192       0 2481048116 3025                                                activating+remapped 2019-05-15 15:05:19.135491  102723'1445725   103721:1907186 [70,13,72]p70  [26,13,8]p26 2019-05-14 17:04:13.644789 2019-05-14 17:04:13.644789 
23.7e1     604        0       604       0 2517671954 3008                                                activating+remapped 2019-05-15 14:56:04.246016   102730'739689   103721:1262764   [8,79,21]p8   [8,21,26]p8 2019-05-14 13:57:52.964361 2019-05-13 09:54:51.371622 
62.4b      108      794         0       0   74451903 1028                            activating+undersized+degraded+remapped 2019-05-15 14:56:04.330268     102517'1028     103721:22340 [79,78,20]p79    [78,20]p78 2019-05-14 16:30:18.090859 2019-05-14 16:30:18.090859 
62.4e      118      386         0       0  103058459 1011                            activating+undersized+degraded+remapped 2019-05-15 15:05:17.348109     102517'1011     103721:24725 [77,70,19]p77    [77,19]p77 2019-05-15 13:36:55.090172 2019-05-14 08:40:20.383295 

-Yenya

: From: Jan Kasprzak [mailto:kas@xxxxxxxxxx] 
: Sent: woensdag 15 mei 2019 14:46
: To: ceph-users@xxxxxxxx
: Subject:  Huge rebalance after rebooting OSD host (Mimic)
: 
: 	Hello, Ceph users,
: 
: I wanted to install the recent kernel update on my OSD hosts with CentOS 
: 7, Ceph 13.2.5 Mimic. So I set a noout flag and ran "yum -y update" on 
: the first OSD host. This host has 8 bluestore OSDs with data on HDDs and 
: database on LVs of two SSDs (each SSD has 4 LVs for OSD metadata).
: 
: 	Everything went OK, so I rebooted this host. After the OSD host 
: went back online, the cluster went from HEALTH_WARN (noout flag set) to 
: HEALTH_ERR, and started to rebalance itself, with reportedly almost 60 % 
: objects misplaced, and some of them degraded. And, of course 
: backfill_toofull:
: 
:   cluster:
:     health: HEALTH_ERR
:             2300616/3975384 objects misplaced (57.872%)
:             Degraded data redundancy: 74263/3975384 objects degraded 
: (1.868%), 146 pgs degraded, 122 pgs undersized
:             Degraded data redundancy (low space): 44 pgs 
: backfill_toofull
:  
:   services:
:     mon: 3 daemons, quorum stratus1,stratus2,stratus3
:     mgr: stratus3(active), standbys: stratus1, stratus2
:     osd: 44 osds: 44 up, 44 in; 2022 remapped pgs
:     rgw: 1 daemon active
:  
:   data:
:     pools:   9 pools, 3360 pgs
:     objects: 1.33 M objects, 5.0 TiB
:     usage:   25 TiB used, 465 TiB / 490 TiB avail
:     pgs:     74263/3975384 objects degraded (1.868%)
:              2300616/3975384 objects misplaced (57.872%)
:              1739 active+remapped+backfill_wait
:              1329 active+clean
:              102  active+recovery_wait+remapped
:              76   active+undersized+degraded+remapped+backfill_wait
:              31   active+remapped+backfill_wait+backfill_toofull
:              30   active+recovery_wait+undersized+degraded+remapped
:              21   active+recovery_wait+degraded+remapped
:              8    
: active+undersized+degraded+remapped+backfill_wait+backfill_toofull
:              6    active+recovery_wait+degraded
:              4    active+remapped+backfill_toofull
:              3    active+recovery_wait+undersized+degraded
:              3    active+remapped+backfilling
:              2    active+recovery_wait
:              2    active+recovering+undersized
:              1    active+clean+remapped
:              1    active+undersized+degraded+remapped+backfill_toofull
:              1    active+undersized+degraded+remapped+backfilling
:              1    active+recovering+undersized+remapped
:  
:   io:
:     client:   681 B/s rd, 1013 KiB/s wr, 0 op/s rd, 32 op/s wr
:     recovery: 142 MiB/s, 93 objects/s
:  
: (note that I cleaned the noout flag afterwards). What is wrong with it?
: Why did the cluster decided to rebalance itself?
: 
: I am keeping the rest of the OSD hosts unrebooted for now.
: 
: Thanks,
: 
: -Yenya
: 
: -- 
: | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - 
: private}> |
: | http://www.fi.muni.cz/~kas/                         GPG: 
: 4096R/A45477D5 |
: sir_clive> I hope you don't mind if I steal some of your ideas?
:  laryross> As far as stealing... we call it sharing here.   --from 
: rcgroups
: _______________________________________________
: ceph-users mailing list
: ceph-users@xxxxxxxxxxxxxx
: http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
: 

-- 
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
sir_clive> I hope you don't mind if I steal some of your ideas?
 laryross> As far as stealing... we call it sharing here.   --from rcgroups
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com