ceph OSD down+out =>health ok => remove => PGs backfilling... ?

SCHAER Frederic <frederic.schaer@xxxxxx> · Tue, 26 Apr 2016 10:32:26 +0000

Hi,

One simple/quick question.
In my ceph cluster, I had a disk wich was in predicted failure. It was so much in predicted failure that the ceph OSD daemon crashed.

After the OSD crashed, ceph moved data correctly (or at least that’s what I thought), and a ceph –s was giving a “HEALTH_OK”.
Perfect.
I tride to tell ceph to mark the OSD down : it told me the OSD was already down… fine.

Then I ran this :
ID=43 ; ceph osd down $ID ; ceph auth del osd.$ID ; ceph osd rm $ID ; ceph osd crush remove osd.$ID

And immediately after this, ceph told me :
# ceph -s
    cluster 70ac4a78-46c0-45e6-8ff9-878b37f50fa1
     health HEALTH_WARN
            37 pgs backfilling
            3 pgs stuck unclean
            recovery 12086/355688 objects misplaced (3.398%)
     monmap e2: 3 mons at {ceph0=192.54.207.70:6789/0,ceph1=192.54.207.71:6789/0,ceph2=192.54.207.72:6789/0}
            election epoch 938, quorum 0,1,2 ceph0,ceph1,ceph2
     mdsmap e64: 1/1/1 up {0=ceph1=up:active}, 1 up:standby-replay, 1 up:standby
     osdmap e25455: 119 osds: 119 up, 119 in; 35 remapped pgs
      pgmap v5473702: 3212 pgs, 10 pools, 378 GB data, 97528 objects
            611 GB used, 206 TB / 207 TB avail
            12086/355688 objects misplaced (3.398%)
                3175 active+clean
                  37 active+remapped+backfilling
  client io 192 kB/s rd, 1352 kB/s wr, 117 op/s

Off course, I’m sure the OSD 43 was the one that was down ;)
My question therefore is :

If ceph successfully and automatically migrated data off the down/out OSD, why is there even anything happening once I tell ceph to forget about this osd ?
Was the cluster not “HEALTH OK” after all ?

(ceph-0.94.6-0.el7.x86_64 for now)

Thanks && regards

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com