Hello, Ceph users, I wanted to install the recent kernel update on my OSD hosts with CentOS 7, Ceph 13.2.5 Mimic. So I set a noout flag and ran "yum -y update" on the first OSD host. This host has 8 bluestore OSDs with data on HDDs and database on LVs of two SSDs (each SSD has 4 LVs for OSD metadata). Everything went OK, so I rebooted this host. After the OSD host went back online, the cluster went from HEALTH_WARN (noout flag set) to HEALTH_ERR, and started to rebalance itself, with reportedly almost 60 % objects misplaced, and some of them degraded. And, of course backfill_toofull: cluster: health: HEALTH_ERR 2300616/3975384 objects misplaced (57.872%) Degraded data redundancy: 74263/3975384 objects degraded (1.868%), 146 pgs degraded, 122 pgs undersized Degraded data redundancy (low space): 44 pgs backfill_toofull services: mon: 3 daemons, quorum stratus1,stratus2,stratus3 mgr: stratus3(active), standbys: stratus1, stratus2 osd: 44 osds: 44 up, 44 in; 2022 remapped pgs rgw: 1 daemon active data: pools: 9 pools, 3360 pgs objects: 1.33 M objects, 5.0 TiB usage: 25 TiB used, 465 TiB / 490 TiB avail pgs: 74263/3975384 objects degraded (1.868%) 2300616/3975384 objects misplaced (57.872%) 1739 active+remapped+backfill_wait 1329 active+clean 102 active+recovery_wait+remapped 76 active+undersized+degraded+remapped+backfill_wait 31 active+remapped+backfill_wait+backfill_toofull 30 active+recovery_wait+undersized+degraded+remapped 21 active+recovery_wait+degraded+remapped 8 active+undersized+degraded+remapped+backfill_wait+backfill_toofull 6 active+recovery_wait+degraded 4 active+remapped+backfill_toofull 3 active+recovery_wait+undersized+degraded 3 active+remapped+backfilling 2 active+recovery_wait 2 active+recovering+undersized 1 active+clean+remapped 1 active+undersized+degraded+remapped+backfill_toofull 1 active+undersized+degraded+remapped+backfilling 1 active+recovering+undersized+remapped io: client: 681 B/s rd, 1013 KiB/s wr, 0 op/s rd, 32 op/s wr recovery: 142 MiB/s, 93 objects/s (note that I cleaned the noout flag afterwards). What is wrong with it? Why did the cluster decided to rebalance itself? I am keeping the rest of the OSD hosts unrebooted for now. Thanks, -Yenya -- | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | sir_clive> I hope you don't mind if I steal some of your ideas? laryross> As far as stealing... we call it sharing here. --from rcgroups _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com